巴西专利BR112020006875A2 low complexity project for fruc

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
It is a method for decoding video data that includes building, using a video decoder implanted in the set of processing circuits, a candidate list of motion vector information for a portion of a current frame. The method includes receiving, via the video decoder, signaling information indicating initial motion vector information from the candidate list of motion vector information, with the initial motion vector information indicating a starting position in a frame of reference. The method includes refining, by means of the video decoder, based on one or more bilateral or model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the frame of reference which is within a search range from the starting position. The method includes generating, through the video decoder, a predictive block based on the refined motion vector information and decoding, through the video decoder, the current frame based on the predictive block.
公开号:BR112020006875A2
申请号:R112020006875-6
申请日:2018-09-17
公开日:2020-10-06
发明作者:Wei-Jung Chien；Hsiao-Chiang Chuang；Xiang Li；Jianle Chen；Li Zhang；Marta Karczewicz
申请人:Qualcomm Incorporated；
IPC主号:

专利说明:

[0001] [0001] This application claims priority over patent application No. US 16 / 131,860, filed on September 14, 2018, and claims the benefit of provisional patent application No. US 62 / 571,161, filed on October 11, 2017, whose full content is hereby incorporated by reference. TECHNICAL FIELD
[0002] [0002] This disclosure refers to video encoding and decoding. BACKGROUND
[0003] [0003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAS), laptop or desktop computers, computer tablet type, e-book readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, satellite or cellular radio phones, so-called “smart phones”, video teleconferencing devices , video streaming devices and the like. Digital video devices implement video compression techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264 / MPEG-4, Part 10, Video encoding Advanced video (AVC), the ITU-T H.265 standard, high-efficiency video encoding (HEVC) and extensions of such standards. Video devices can transmit, receive, encode, decode and / or store digital video information more effectively through the implementation of such video compression techniques.
[0004] [0004] Video compression techniques perform spatial prediction (intra-image) and / or temporal prediction (inter-image) to reduce or remove redundancy inherent in video sequences. For block-based video encoding, a video slice (that is, a video frame or a portion of a video frame) can be divided into video blocks, which can also be called tree blocks, units coding (CUs) and / or coding nodes. The video blocks in an intra-encoded slice (1) of an image are encoded using spatial prediction in relation to reference samples in neighboring blocks in the same image. Video blocks on an inter-encoded slice (P or B) of an image can use spatial prediction over reference samples in neighboring blocks in the same image or time prediction over reference samples in other reference images. The temporal or spatial prediction results in a predictive block for a block to be coded. Residual data represents pixel differences between the original block to be encoded and the predictive block. An inter-coded block is coded according to a motion vector that points to a block of reference samples that form the predictive block, and the residual data that indicates the difference between the coded block and the predictive block. An intra-coded block is encoded according to an intra-coding mode and residual data. For additional compression, residual data can be transformed from the pixel domain into a transform domain, resulting in residual transform coefficients that can then be quantified. SUMMARY
[0005] [0005] In general, this disclosure describes techniques related to improvements in existing techniques for Upward Conversion Frame Rate (FRUC). US patent publication No. US-2016-0286230 describes FRUC-based techniques. The techniques of this development for low complexity FRUC can be applied to any of the existing video codecs, such as FIEVC (high efficiency video encoding) or can be an effective encoding tool for future video encoding standards, such as the standard Video Coding System currently under development. More particularly, this disclosure describes techniques related to reducing a number of reference samples used from external memory to perform search operations for FRUC.
[0006] [0006] In one example, a method for decoding video data includes building, using a video decoder implanted in the processing circuitry, a candidate list of motion vector information for a portion of a current frame, receive, through the video decoder, signaling information indicating initial motion vector information from the candidate list of motion vector information, with the initial motion vector information indicating a starting position in a frame of reference, refine, by means of the video decoder, based on one or more among bilateral correspondence or model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the reference frame that is within of a search track from the start position, generate, through the video decoder, a predictive block with m based on the refined motion vector information, and decode, using the video decoder, the current frame based on the predictive block.
[0007] [0007] In another example, a device for decoding video data includes a memory configured to store the video data and processing circuitry. The processing circuitry is configured to build a candidate list of motion vector information for a portion of a current frame, to receive signaling information indicating initial motion vector information from the candidate list of motion vector information , where the initial motion vector information indicates a starting position in a frame of reference, refine, based on one or more among bilateral correspondence or model correspondence, the initial motion vector information to determine motion vector information refined positions that indicate a refined position in the frame of reference that is within a search range from the starting position, generate a predictive block based on the refined motion vector information, and decode the current frame based on the predictive block.
[0008] [0008] In another example, a non-transitory, computer-readable media is configured with one or more instructions that, when executed, cause one or more processors to build a candidate list of motion vector information for a portion of a current frame, receive signaling information indicating initial motion vector information from the motion vector information candidate list, with the initial motion vector information indicating a starting position in a reference frame, refine, based on one or more between bilateral correspondence or model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the frame of reference that is within a search range from the starting position, generates a predictive block based on the refined motion vector information, and decode the current frame with base and in the predictive block.
[0009] [0009] In another example, a device comprises a means of building a list of motion vector information candidates for a portion of a current frame, receiving signaling information indicating initial motion vector information from the information candidate list motion vector information, where the initial motion vector information indicates a starting position in a frame of reference, refine, based on one or more between bilateral correspondence or model correspondence, the initial motion vector information to determine information motion vector vectors that indicate a refined position in the frame of reference that is within a search range from the starting position, generate a predictive block based on the refined motion vector information, and decode the current frame based on the predictive block.
[0010] [0010] In another example, a method for encoding video data includes constructing, using a video encoder implanted in the processing circuitry, a candidate list of motion vector information for a portion of a current frame , select, using the video encoder, initial motion vector information from the candidate list of motion vector information, with the initial motion vector information indicating a starting position in a frame of reference, refine by of the video encoder, based on one or more of bilateral or model matching, the initial motion vector information to determine refined motion vector information that indicates a refined position in the frame of reference that is within a range of search from the starting position, generate, through the video encoder, a predictive block based on the vector information of movim then refine, generate, through the video encoder, residual sample values for the current block of video data based on the predictive block, and output, through the video encoder, an indication of the residual sample values and information of cues that indicate the initial motion vector information from the motion vector information candidate list.
[0011] [0011] In another example, a device for encoding video data includes a memory configured to store the video data and processing circuitry. The processing circuitry is configured to build a candidate list of motion vector information for a portion of a current frame, select initial motion vector information from the candidate list of motion vector information, the information being initial motion vector values indicate a starting position in a frame of reference, refine, based on one or more between bilateral correspondence or model correspondence, the initial motion vector information to determine refined motion vector information that indicates a position refined in the frame of reference that is within a search range from the starting position, generate a predictive block based on the refined motion vector information, generate residual sample values for the current block of video data based on the block predictive, and issue an indication of residual sample values and signaling information that indicates cam the initial motion vector information from the motion vector information candidate list.
[0012] [0012] In another example, a non-transitory computer-readable medium is configured with one or more instructions that, when executed, cause one or more processors to build a candidate list of motion vector information for a portion of a current frame, select initial motion vector information from the candidate list of motion vector information, the initial motion vector information indicating a starting position in a frame of reference, refine, based on one or more matches bilateral or model matching, the initial motion vector information to determine refined motion vector information that indicates a refined position in the frame of reference that is within a search range from the starting position, generate a predictive block based on in the refined motion vector information, generate residual sample values for the current block of video data c on the basis of the predictive block, and issue an indication of the residual sample values and signaling information that indicate the initial motion vector information from the candidate list of motion vector information.
[0013] [0013] In another example, a device comprises means for building a candidate list of motion vector information for a portion of a current frame, selecting initial motion vector information from the candidate list of motion vector information, where the initial motion vector information indicates an initial position in a frame of reference, refine, based on one or more among bilateral correspondence or model correspondence, the initial motion vector information to determine refined motion vector information that indicate a refined position in the frame of reference that is within a search range from the starting position, generate a predictive block based on the refined motion vector information, generate residual sample values for the current block of video data based on the predictive block, and issue an indication of residual sample values and signaling information that indicate the initial motion vector information from the motion vector information candidate list.
[0014] [0014] Details of one or more aspects of the disclosure are presented in the accompanying drawings and in the description below. Other resources, objectives and advantages of the techniques described in this disclosure will be evident from the description, drawings and claims. BRIEF DESCRIPTION OF THE DRAWINGS
[0015] [0015] Figure 1 is a block diagram illustrating an exemplary video encoding and decoding system that can use one or more techniques described in this disclosure.
[0016] [0016] Figure 2A is a conceptual diagram that illustrates spatial neighboring MV candidates for blending mode.
[0017] [0017] Figure 2B is a conceptual diagram that illustrates spatial neighboring MV candidates for AMVP mode.
[0018] [0018] Figure 3A is a first conceptual diagram that illustrates the vector prediction of temporal movement in HEVC.
[0019] [0019] Figure 3B is a second conceptual diagram that illustrates the vector prediction of temporal movement in HEVC.
[0020] [0020] Figure 4 is a conceptual diagram that illustrates unilateral ME in FRUC.
[0021] [0021] Figure 5 is a conceptual diagram that illustrates bilateral ME in FRUC.
[0022] [0022] Figure 6 is a conceptual diagram that illustrates DMVD based on model correspondence.
[0023] [0023] Figure 7 is a conceptual diagram illustrating derivation of bidirectional mirror-based MV in DMVD.
[0024] [0024] Figure 8A is a conceptual diagram that illustrates derivation of motion vector based on extended bilateral correspondence.
[0025] [0025] Figure 8B is a block diagram illustrating PU decoding with added pu dmvd flag.
[0026] [0026] Figure 9 is a conceptual diagram that illustrates bilateral correspondence.
[0027] [0027] Figure 10 is a conceptual diagram that illustrates model correspondence.
[0028] [0028] Figure 11 is a conceptual diagram that illustrates neighboring samples used to derive CI parameters.
[0029] [0029] Figure 12 is a conceptual diagram that illustrates DMVD based on bilateral model correspondence.
[0030] [0030] Figure 13 is a block diagram that illustrates an example video encoder that can implement one or more techniques described in this disclosure.
[0031] [0031] Figure 14 is a block diagram that illustrates an example video decoder that can implement one or more techniques described in this disclosure.
[0032] [0032] Figure 15 is a block diagram that illustrates an exemplary operation of a video decoder according to one or more techniques described in this disclosure.
[0033] [0033] Figure 16 is a block diagram that illustrates an exemplified operation for a video encoder, according to one or more techniques described in this disclosure. DETAILED DESCRIPTION
[0034] [0034] The techniques of this disclosure refer to the derivation of movement information on the decoder side, block partition and / or interpolation of video data in block-based video encoding. The techniques can be applied to any of the existing video codecs, such as high efficiency video encoding (HEVC), or be an effective encoding tool for any future video encoding standards.
[0035] [0035] Video encoding devices implement video compression techniques to effectively encode and decode video data. Video compression techniques may include the application of spatial prediction (eg, intraframe prediction), temporal prediction (eg, interframe prediction) and / or other prediction techniques to reduce or remove inherent redundancy in sequences of video. A video encoder typically partitions each image from an original video sequence into rectangular regions, referred to as video blocks or encoding units (described in more detail below). These video blocks can be encoded using a particular prediction mode.
[0036] [0036] For inter-prediction modes, a video encoder typically looks for a block similar to the one that is encoded in a frame at another time point, called a frame of reference. The video encoder can restrict the search to a certain spatial displacement from the block to be encoded. A better match can be found with the use of a two-dimensional (2D) motion vector that includes a horizontal displacement component and a vertical displacement component. For an intra-prediction mode, a video encoder can form the predicted block using spatial prediction techniques based on data from neighboring blocks previously encoded within the same image.
[0037] [0037] The video encoder can determine a prediction error, that is, the difference between the pixel values in the block that is encoded and in the predicted block (also called as residual). The video encoder can also apply a transform to the prediction error, such as a discrete cosine transform (DCT), to generate transformation coefficients. After the transformation, the video encoder can quantify the transform coefficients. The quantized transform coefficients and motion vectors can be represented using syntax elements and, together with the control information, form a coded representation of a video sequence. In some cases, the video encoder can perform the entropy of code syntax elements, further reducing the number of bits required for their representation.
[0038] [0038] A video decoder can, using the syntax elements and control information discussed above, build predictive data (for example, a predictive block) to decode a current frame. For example, the video decoder can add the predicted block and the compressed prediction error. The video decoder can determine the compressed prediction error by weighting the functions of the transform base using quantified coefficients. The difference between the reconstructed frame and the original frame is called a reconstruction error.
[0039] [0039] In some cases, a video decoder or post-processing device may interpolate images based on one or more reference images. Such interpolated images are not included in an encoded bit stream. The video decoder or post-processing device can interpolate images to upwardly convert an original frame rate from an encoded bit stream. This process can be referred to as upward conversion of frame rate (FRUC). Alternatively or additionally, the video decoder or post-processing device can interpolate images to insert one or more images that have been ignored by a video encoder to encode a video sequence at a reduced frame rate. In both cases, the video decoder or post-processing device interpolates frames that are not included in an encoded bit stream that was received by the video decoder. The video decoder or post-processing device can interpolate the images using any one of several interpolation techniques, for example, using motion compensated frame interpolation, frame repetition or average frame calculation.
[0040] [0040] Although certain techniques for interpolating images have been used for purposes of upward conversion, such techniques were not widely used during video encoding, for example, for encoding video data that is included in an encoded bit stream. For example, techniques for interpolating images can be relatively time consuming and / or require a relatively large amount of processing power. Consequently, such techniques have typically not been performed in a loop by decoding video data.
[0041] [0041] According to one or more techniques described in this document, instead of retrieving reference samples from external memory to perform a search for each motion vector from a candidate list of motion vector information (for example , start motion vectors), a video decoder can retrieve only samples from external memory to perform a search for initial motion vector information from the candidate list of motion vector information that is signaled by a motion encoder. video. In this way, the video decoder can reduce the amount of reference samples used from the external memory to perform the search, thereby reducing the amount of energy used to derive motion information from the decoder side. For example, configuring a video decoder to receive signaling information indicating initial motion vector information from a candidate list of motion vector information and to refine the initial motion vector information can reduce an amount of energy used to derive motion information from the decoder side. In some examples, configuring a video encoder to select initial motion vector information from a candidate list of motion vector information and to issue an indication of signaling information indicating the initial motion vector information from the list of motion vector information candidates can reduce an amount of energy used to derive motion information from the decoder side.
[0042] [0042] As used in this disclosure, the term video-to-code conversion refers generically to video encoding or video decoding. Similarly, the term video-to-code converter can generally refer to a video encoder or a video decoder. In addition, certain techniques described in this disclosure regarding video decoding may also apply to video encoding and vice versa. For example, video encoders and video decoders are often configured to perform the same or reciprocal processes. In addition, video encoders typically perform video decoding as part of the process of determining how to encode video data.
[0043] [0043] Figure 1 is a block diagram illustrating an example video encoding and decoding system 10 that can use the FRUC techniques of this disclosure. As shown in Figure 1, system 10 includes a source device 12 that provides encoded video data to be further decoded by a target device 14. In particular, source device 12 provides video data to the target device 14 via a computer-readable medium 16. The source device 12 and the target device 14 can include any of a wide range of devices, including desktop computers, notebook computers (ie, laptop computers) , tablet computers, signal decoders, telephone devices such as so-called “smart” phones, tablet computers, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or similar. In some cases, the source device 12 and the destination device 14 may be equipped for wireless communication. In this way, the source device 12 and the destination device 14 can be wireless communication devices. The source device 12 is an exemplary video encoding device (i.e., a device for encoding video data). The target device 14 is an exemplary video decoding device (i.e., a device for decoding video data).
[0044] [0044] In the example in Figure 1, the source device 12 includes a video source 18, storage media 19 configured to store video data, a video encoder 20 and an output interface 24. The destination device 14 includes an input interface 26, storage media 28 configured to store encoded video data, a video decoder 30 and display device 32. In other examples, source device 12 and destination device 14 include other components or arrangements. For example, source device 12 can receive video data from an external video source, such as an external camera. Similarly, the target device 14 can interface with an external display device, instead of including an integrated display device.
[0045] [0045] The illustrated system 10 of Figure 1 is just an example. Techniques for processing video data can be performed by any digital video encoding and / or decoding device. Although, in general, the techniques of this disclosure are performed by a video encoding device, the techniques can also be performed by a video encoder / decoder, typically referred to as a "CODEC". The source device 12 and the target device 14 are just examples of such encoding devices where the source device 12 generates encoded video data for transmission to the target device 14. In some examples, The source device 12 and the target device 14 can operate in a substantially symmetrical manner so that each of the source device 12 and the target device 14 includes video encoding and decoding components. Therefore, system 10 can support unidirectional or bidirectional video transmission between the source device 12 and the destination device 14, for example, for video streaming, video playback, video broadcasting or video telephony.
[0046] [0046] The video source 18 of the source device 12 may include a video capture device, such as a video camera, a video file that contains previously captured video and / or a video feed interface for receiving data from video from a video content provider. As an additional alternative, video source 18 can generate data based on computer graphics such as source video, or a combination of live video, archived video and computer generated video. The source device 12 may comprise one or more data storage media (e.g., storage media 19) configured to store the video data. The techniques described in the present disclosure can be applicable to video encoding in general and can be applied to wired and / or wireless applications. In each case, the captured, pre-captured or computer-generated video can be encoded by the video encoder 20. The output interface 24 can output the encoded video information to a computer-readable medium 16.
[0047] [0047] The target device 14 can receive the encoded video data to be decoded via the computer-readable medium 16. The computer-readable media 16 can comprise any type of media or device capable of moving the encoded video data to from the source device 12 to the target device 14. In some examples, computer-readable media 16 comprises communication media to enable the source device 12 to transmit encoded video data directly to the target device 14 in time real. The encoded video data can be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to the destination device 14. The communication medium can comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines. Communication media can form part of a packet-based network, such as a local area network, a wide area network, or a global network, such as the Internet. The communication medium may include routers, switches, base stations or any other equipment that may be useful to facilitate communication from the source device 12 to the destination device 14. The destination device 14 can comprise one or more media data storage devices configured to store encoded video data and decoded video data.
[0048] [0048] In some examples, encrypted data can be output from the output interface 24 to a storage device. Similarly, encrypted data can be accessed from the storage device via the input interface. The storage device can include any of a variety of data storage media distributed or accessed locally such as a hard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other media digital storage suitable for storing encoded video data. In an additional example, the storage device can correspond to a file server or another intermediate storage device that can store the encoded video generated - by the source device 12. [) destination device 14 can access the stored video data from the storage device via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting that encoded video data to the target device 14. The exemplary file servers include a web server (for example, for a website ), an FTP server, network storage devices (NAS), or a local disk drive. The target device 14 can access the encoded video data through any standard data connection, including an Internet connection. This can include a wireless channel (for example, a Wi-Fi connection), a wired connection (for example, DSL, cable modem, etc.), or a combination of both that is suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from the storage device can be a continuous transmission, a downloadable transmission or a combination thereof.
[0049] [0049] The techniques can be applied to video encoding in support of any of a variety of multimedia applications, such as broadcast television broadcasts, cable television broadcasts, satellite television broadcasts, streaming video broadcasts via the Internet, such as dynamic adaptive streaming over HTTP (DASH), digital video that is encoded on a data storage medium, decoding of digital video stored on a data storage medium or other applications. In some instances, system 10 can be configured to support unidirectional or bidirectional video transmission to support applications such as video streaming, video playback, video broadcasting and / or video telephony.
[0050] [0050] Computer-readable media 16 may include transient media, such as a wired network transmission or wireless broadcast, or storage media (that is, non-transient storage media), such as a hard disk, magnetic disk, disk compact disc, digital video disc, Blu-ray disc or other computer-readable media. In some examples, a network server (not shown) can receive encoded video data from the source device 12 and provide the encoded video data to the destination device 14, for example, via network transmission. Similarly, a computing device of a media production facility, such as a disc embossing facility, can receive encoded video data from the source device 12 and produce a disc containing the encoded video data. Therefore, computer-readable media 16 can be understood to include one or more computer-readable media in several ways, in several examples.
[0051] [0051] Input interface 26 of destination device 14 receives information from computer-readable medium 16. Computer-readable medium information 16 may include syntax information defined by video encoder 20 of video encoder 20, which they are also used by the video decoder 30, which includes elements of syntax that describe characteristics and / or processing of blocks and other encoded units, for example, image groups (GOPs). The storage media 28 can be configured to store encoded video data, such as encoded video data (for example, a bit stream) received by the input interface 26. The display device 32 displays the decoded video data for a user and can comprise any of a variety of display devices such as a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display or another type of display device.
[0052] [0052] Each of the video encoder 20 and the video decoder 30 can be deployed as any one of a variety of suitable decoder circuitry sets, such as one or more microprocessors, digital signal processors (DSPs), integrated circuits application specific (ASICS), field programmable port arrangements (FPGAs), discrete logic, software, hardware, firmware or any combinations thereof. When techniques are partially deployed in software, a device can store instructions for the software on suitable non-transitory, computer-readable media and execute instructions in hardware using one or more processors to perform the techniques in this disclosure. Each of video encoder 20 and video decoder 30 can be included in one or more encoders or decoders, any of which can be integrated as part of a combined encoder / decoder (CODEC) in a respective device.
[0053] [0053] In some examples, the video encoder and video decoder 30 may operate according to a video encoding standard such as i, future or existing standard. exemplifying video encoding standards include, but are not limited to, ITU-T H.261, ISO / IEC MPEG-1 Visual, ITU-T H.262 or ISO / IEC MPEG-2 Visual, ITU-T H. 263, ISO / IEC MPEG4 Visual and ITU-T H.264 (also known as ISO / IEC MPEG-4 AVC), including its scalable video encoding extensions (SVC) and multiple display video encoding (MVC). In addition, a new standard for video encoding, that is, high efficiency video encoding (HEVC) or ITU-T H.265, including its screen and range content encoding extensions, 3D video encoding (3D- HEVC) and multiple display extensions (MV-HEVC) and scalable extension (SHVC), has recently been developed by the Joint Collaborative Team on Video Coding (JCT-VC), as well as the Joint Collaboration Team on 3D Video Coding Extension Development (JCT -3V) from the ITU-T (VCEG) video coding expert group and ISO / IEC (MPEG) motion picture specialist group.
[0054] [0054] In HEVC and other video encoding specifications, a video sequence typically includes a series of images. Images can also be referred to as "frames". An image can include three sample matrices, denoted Sr, Sc E & Scr. Sº is a two-dimensional matrix (that is, a block) of luma samples. Sw is a two-dimensional matrix of Cb chrominance samples. Scr is a two-dimensional matrix of Cr chrominance samples. Chrominance samples can also be referred to in this document as “chroma” samples. In other cases, an image may be monochromatic and may include only one matrix of luma samples.
[0055] [0055] To generate an encoded representation of an image, the video encoder 20 can encode blocks of an image of the video data. The video encoder 20 may include, in a bit stream, an encoded representation of the video block. For example, in HEVC, to generate a coded representation of an image, the video encoder 20 can generate a set of encoding tree units (CTUS). Each CTUS can comprise one or more coding tree blocks (CTBs) and can comprise syntax structures used to encode samples from one or more coding tree blocks. For example, each CTU may comprise a luma sample code tree block, two corresponding chroma sample code tree blocks, and syntax structures used to code the samples of the coding tree blocks. In monochrome images or images that have three separate color planes, a CTU can comprise a single coding tree block and syntax structures used to encode the samples in the coding tree block. A coding tree block can be an NXN block of samples. A CTU can also be referred to as a “tree block” or a “larger coding unit” (LCU). A syntax structure can be defined as zero or more syntax elements present together in the bit stream in a specified order. The size of a CTB can be in the range of 16x16 to 64x64 in the main HEVC profile (although technically CTB sizes of 8x8 can be supported).
[0056] [0056] In HEVC, a slice includes an integer number of CTUs ordered consecutively in a raster scan order. Thus, in HEVC, the largest coding unit in a slice is called a coding tree block (CTB).
[0057] [0057] In HEVC, to generate an encoded CTU of an image, the video encoder 20 can recursively partition the quadtree into the encoding tree blocks of a CTU to divide the encoding tree blocks into encoding blocks , therefore, the name “coding tree units”. A coding block is an NXN block of samples. A coding unit (CU) can comprise one or more coding blocks and syntax structures used to encode samples from the one or more coding blocks. For example, a CU may comprise a luma sample coding block and two corresponding chroma sample coding blocks of an image that has a luma sample matrix, a Cb sample matrix and a Cr sample matrix, and syntax structures used to code the samples of the coding blocks. In monochrome images or images that have three separate color planes, a CU can comprise a single coding block and syntax structures used to encode the samples in the coding block. Thus, a CTB can contain a quadtree whose nodes are CUs.
[0058] [0058] Additionally, the video encoder 20 can encode a CU. For example, to encode a CU, video encoder 20 can partition a CU encoding block into one or more prediction blocks. A prediction block is a rectangular block (that is, square or non-square) of samples to which the same prediction is applied. A CU prediction unit (PU) can comprise one or more CU prediction blocks and syntax structures used to predict the one or more prediction blocks. For example, a PU can comprise a luma sample prediction block, two corresponding chroma sample prediction blocks and syntax structures used to predict the prediction blocks. In monochrome images or images that have three separate color planes, a PU can comprise a single prediction block and syntax structures used to predict the prediction block. The video encoder 20 can generate predictive blocks (for example, luma, Cb and Cr predictive blocks) for prediction blocks (for example, luma, Cb and Cr prediction blocks) of each CU of the CU.
[0059] [0059] In HEVC, each CU is coded with a mode, which would be intra mode or inter mode. When a CU is interrupted
[0060] [0060] When the CU is inter-coded, a set of movement information is present for each PU. In addition, each PU is coded with a unique inter-prediction mode to derive the set of motion information. If the video encoder 20 uses intra-prediction to generate the predictive blocks of a PU, the video encoder 20 can generate the predictive blocks of the PU based on decoded samples of the image that includes the PU. When a CU is intra-coded, 2Nx2N and NxN are the only permissible PU formats, and within each PU a single intra-prediction mode is coded (while the chroma prediction mode is signaled at the CU level). NxXN intra PU formats are only allowed when the current CU size is equal to the smaller CU size defined in a sequence parameter set (SPS).
[0061] [0061] The video encoder 20 can generate one or more residual blocks for the CU. For example, video encoder 20 can generate a luma residual block for the CU. Each sample in the CU luma residual block indicates a difference between a luma sample in one of CU's predictive luma blocks and a corresponding sample in the original CU luma coding block. In addition, video encoder 20 can generate a residual block Cb for the CU. Each sample in the residual block Cb of a CU can indicate a difference between a sample Cb in one of the predictive Cb blocks of the CU and a corresponding sample in the original Cb coding block of the CU. The video encoder 20 can also generate a residual block Cr for the CU. Each sample in the CU residual Cr block can indicate a difference between a Cr sample in one of the CU predictive Cr blocks and a corresponding sample in the original CU Cr coding block.
[0062] [0062] Additionally, the video encoder 20 can decompose the residual blocks of a CU into one or more transform blocks. For example, the video encoder can use the quadtree partition to decompose the residual blocks of a CU into one or more transform blocks. A transform block is a rectangular block (for example, square or non-square) of samples to which the same transform is applied. A transform unit (TU) of a CU can comprise one or more transform blocks. For example, a TU may comprise a luma sample transform block, two corresponding chroma sample transform blocks, and syntax structures used to transform the transform block samples. Thus, each CU of a CU can have a luma transform block, a Cb transform block and a Cr transform block. The TU luma transform block can be a sub-block of the CU luma residual block. The transform block Cb can be a sub-block of the residual block Cb of CU. The transform block Cr can be a sub-block of the residual block Cr of CU. In monochrome images or images that have three separate color planes, a TU can comprise a single transform block and syntax structures used to transform the samples in the transform block.
[0063] [0063] The video encoder 20 can apply one or more transforms in a transform block of a TU to generate a coefficient block for the TU. For example, video encoder 20 can apply one or more transforms to a luma transform block of a TU to generate a lum coefficient block for the TU. A coefficient block can be a two-dimensional matrix of transform coefficients. A transform coefficient can be a scalar quantity. The video encoder 20 can apply one or more transforms in a transform block Cb of a TU to generate a coefficient block Cb for the TU. The video encoder 20 can apply one or more transforms in a transform block Cr of a TU to generate a block of coefficient Cr for the TU.
[0064] [0064] In some examples, the video encoder ignores the application of transforms in the transform block. In such examples, the video encoder 20 can treat residual sample values in the same way as the transform coefficients. Thus, in the examples where the video encoder 20 ignores the application of the transforms, the following discussion of transform coefficients and coefficient blocks may be applicable to transform blocks of residual samples.
[0065] [0065] After the generation of a coefficient block, the video encoder 20 can quantify the coefficient block. Quantification refers, in general, to a process in which the transform coefficients are quantified to possibly reduce the amount of data used to represent the transform coefficients, providing additional compression. In some instances, video encoder 20 ignores quantization. After the video encoder 20 quantizes a coefficient block, the video encoder 20 can generate syntax elements that indicate the quantized transform coefficients. The video encoder 20 can entropy encode one or more of the syntax elements that indicate the quantized transform coefficients. For example, video encoder 20 can perform binary context-adaptive arithmetic (CABAC) encoding on the syntax elements that indicate the quantized transform coefficients.
[0066] [0066] Video encoder 20 can output a bit stream that includes encoded video data. For example, the bit stream may comprise a bit stream that forms a representation of encoded images of the video data and associated data. In this way, the bit stream comprises an encoded representation of video data. In some examples, a representation of an encoded image may include encoded representations of blocks.
[0067] [0067] The bit stream may comprise a sequence of network abstraction layer (NAL) units. An NAL unit is a syntax structure that contains an indication of the type of data in the NAL unit and bytes that contain that data in the form of a raw byte sequence payload (RBSP) interspersed, as necessary, with bits preventing emulation. Each NAL unit can include an NAL unit header and encapsulates an RBSP. The NAL unit header can include a syntax element that indicates an NAL unit type code. The NAL unit type code specified by the NAL unit header of an NAL unit indicates the type of the NAL unit. An RBSP can be a syntax structure that contains an integer number of bytes that is encapsulated within an NAL unit. In some cases, an RBSP includes zero bits.
[0068] [0068] The video decoder 30 can receive a bit stream generated by the video encoder 20. Furthermore, the video decoder 30 can analyze the bit stream to obtain syntax elements from the bit stream. The video decoder 30 can reconstruct the images of the video data based, at least in part, on the syntax elements obtained from the bit stream. The process for reconstructing the video data can, in general, be reciprocal to the process performed by the video encoder 20. For example, the video decoder 30 can use PU motion vectors to determine predictive blocks for the PUs of a current CU . In addition, the video decoder 30 can inversely quantify the TU coefficient blocks of the current CU. The video decoder 30 can perform inverse transforms in the coefficient blocks to reconstruct transform blocks of the current CU's TUs. The video decoder 30 can reconstruct the encoding blocks of the current CU by adding the samples of the predictive blocks for PUs of the current CU to corresponding samples of the transform blocks of the current CU's TUs. By reconstructing the encoding blocks for each CU of an image, the video decoder 30 can reconstruct the image.
[0069] [0069] In 2016, MPEG and ITU-T VCEG formed a joint exploration video team (JVET) to explore new encoding tools for the next generation of video encoding standards. The reference software is called JEM (joint exploration model). For each block, a set of movement information may be available. A set of motion information contains motion information for the forward and backward prediction directions. Here, the forward and backward prediction directions are two directions of prediction in a bidirectional prediction mode and the terms "forward" and "backward" do not necessarily have a geometry meaning, preferably the terms correspond to the list reference image O (RefPicList0O) and reference image list 1 (RefPicListl) of a current image. When only a reference image list is available for an image or slice, only RefPicListO is available and the movement information for each block in a slice is always ahead.
[0070] [0070] In some cases, a motion vector in conjunction with its reference index is used in decoding processes, such a motion vector with the associated reference index is denoted as a single predictive set of motion information. In some systems, for each prediction direction, the movement information must contain a reference index and a motion vector. In some cases, for simplicity, a motion vector by itself can be mentioned in a way that the motion vector is assumed to have an associated reference index. A reference index is used to identify a reference image in the current reference image list (RefPicListO or RefPicListl). A motion vector has a horizontal and vertical component.
[0071] [0071] Image order counting (POC) is widely used in video encoding standards to identify an image display order. Although there are cases where two images within an encoded video sequence can have the same POC value, this typically does not happen within an encoded video sequence. When multiple encoded video streams are present in a bit stream, images with the same POC value can be closer to each other in terms of decoding order. The POC values of images can be used for the construction of the reference image list, derivation of reference image set as in HEVC and motion vector scaling.
[0072] [0072] In H.264 / AVC, each inter macroblock (MB) can be partitioned in four different ways: a MB partition l6xl16; two 16x8 MB partitions; two 8xl6 MB partitions; or four 8x8 MB partitions. Different MB partitions in an MB can have different benchmark values for each direction (RefPicListO or RefPicListl). When an MB is not partitioned into four 8x8 MB partitions, the MB can have only one motion vector for each MB partition in each direction.
[0073] [0073] In H.264 / AVC, when an MB is partitioned into four MB 8x8 partitions, each MB 8x8 partition can be additionally partitioned into sub-blocks, each of which may have a different motion vector in each direction . There can be four different ways to obtain sub-blocks from an 8x8 MB partition: an 8x8 sub-block; two 8x4 sub-blocks; two 4x8 sub-blocks; or four 4x4 sub-blocks. Each sub-block can have a different motion vector in each direction. Therefore, the motion vector is present at a level equal to greater than the sub-block.
[0074] [0074] In AVC, the temporal direct mode could be enabled in MB or MB partition level for direct mode or skip in slices B. For each MB partition, the movement vectors of the colocalized block with the partition of
[0075] [0075] In HEVC, the largest coding unit in a slice is called a coding tree block (CTB). A CTB contains a quad-tree whose nodes are coding units. The size of a CTB can be ranges from l6x16 to 64x64 in the main HEVC profile (although technically CTB sizes of 8x8 can be supported). A coding unit (CU) could be the same size as a CTB and as small as 8x8. Each coding unit is coded with a mode (intra mode or inter mode). When a CU is inter-encoded, the CU can be additionally partitioned into 2 or 4 prediction units (Pus) or become just a PU when the additional partition does not apply. When two PUs are present in a CU, the two PUs can be half size rectangles or the size of two rectangles 1-4 or 3-4 the CU size.
[0076] [0076] When the CU is inter-coded, a set of movement information is present for each PU. In addition, each PU is coded with a unique inter-prediction mode to derive the set of motion information.
[0077] [0077] In the HEVC standard, there are two modes of inter-prediction, called blending mode (ignoring is considered a special case of blending) and advanced motion vector (AMVP) prediction mode, respectively, for a prediction unit ( PU).
[0078] [0078] In AMVP or merge mode, a list of motion vector (MV) candidates is maintained for multiple motion vector predictors. The vector (or vectors) of movement, as well as reference indexes in the merge mode, of the current PU is generated by taking a candidate from the MV candidate list.
[0079] [0079] The MV candidate list contains up to 5 candidates for the merge mode and only two candidates for the AMVP mode. A merge candidate can contain a set of motion information, for example, motion vectors that correspond to both reference image lists (list O and list 1) and reference indexes. If a merge candidate is identified by a merge index, reference images are used to predict the current blocks, and the associated motion vectors are determined. However, under the AMVP mode for each potential prediction direction from list O or list 1, a reference index needs to be explicitly flagged, along with an MVP index for the MV candidate list once the AMVP contains only one motion vector. In AMVP mode, the predicted motion vectors can be further refined.
[0080] [0080] A merge candidate can match a complete set of motion information, while an AMVP candidate contains only one motion vector for a specific prediction direction and benchmark. Candidates for both modes are similarly derived from the same spatial and temporal neighboring blocks. Spatial MV candidates are derived from the neighboring blocks shown in Figures 2A and 2B, for a specific PU (PUON), although the techniques that generate candidates from the blocks differ for the blending and AMVP modes.
[0081] [0081] Figures 2A and 2B are conceptual diagrams that illustrate neighboring spatial candidates in HEVC. In some examples, video encoder 20 and / or video decoder 30 may derive spatial motion vector (MV) candidates from neighboring block O, neighboring block 1, neighboring block 2, neighboring block 3 or neighboring block 4 to PUO.
[0082] [0082] In some cases, the techniques for generating MV candidates from blocks differ for the blending and AMVP modes. Figure 2A illustrates an example for the blending mode. For example, in HEVC, a video encoder (for example, such as video encoder 20 and / or video decoder 30 in Figure 1) can derive up to four spatial MV candidates. Candidates may be included in a list of candidates that have a particular order. In one example, the order for the example in Figure 2A can be neighboring block O (Al), neighboring block 1 (Bl), neighboring block 2 (BO), neighboring block 3 (AO) and neighboring block 4 (B2).
[0083] [0083] Figure 2B illustrates an example for AMVP mode. For example, in HEVC, the video encoder can divide neighboring blocks into two groups: left group that includes neighboring block O and neighboring block 1, and above group that includes neighboring block 2, neighboring block 3 and neighboring block 4. For each group, the potential motion vector candidate associated with a neighboring block that refers to the same reference image as that indicated by the flagged reference index (for the block that is currently encoded) may have the highest priority to be chosen to form a final candidate for the group. It is possible that one of the neighboring blocks contains a motion vector that points to the same reference image. Therefore, if such a candidate cannot be found, the video encoder can scale the first available candidate to form the final candidate, thus, differences in temporal distance can be compensated.
[0084] [0084] According to aspects of this disclosure, motion vector candidates, such as motion vectors associated with neighboring blocks shown in Figures 2A and 2B, can be used to derive a motion vector for a block. For example, the video encoder can generate a candidate list that includes motion vector candidates (for example, a candidate list of motion vector information) from the neighboring blocks shown in Figures 2A and 2B. In this example, the video encoder can use one or more of the candidates in the candidate list as an initial motion vector (for example, initial motion vector information) in a process of deriving motion information (for example, correspondence bilateral,
[0085] [0085] Figures 3A and 3B are conceptual diagrams that illustrate HEVC temporal movement vector prediction. A temporal motion vector (TMVP) predictor candidate, if enabled and available, is added to a MV candidate list after the spatial motion vector candidates. In HEVC, the motion vector derivation process for a TMVP candidate is the same for both blending mode and AMVP, however, the target benchmark for the TMVP candidate in blending mode is typically set to zero .
[0086] [0086] Figure 3A illustrates a primary block location (shown as a “T” block) for a TMVP candidate,
[0087] [0087] Figure 3B illustrates the derivation of a TMVP candidate 84 for a current block 86 of a current image 88 from a colocalized PU 90 of a colocalized image 92, as indicated at the slice level (for example, in a slice header). Similar to the direct temporal mode in stroke, a motion vector of the TMVP candidate can be submitted to motion vector scaling, which is performed to compensate for distance differences, for example, temporal distances between images. With respect to motion vector scaling, a video encoder (such as video encoder 20 and / or video decoder 30) can be configured to initially determine that the value of motion vectors is proportional to the distance of images in time presentation. A motion vector associates two images, the reference image, and the image containing the motion vector (that is, the container image). When a motion vector is used to predict the other motion vector, the distance between the container image and the reference image is calculated based on the POC values.
[0088] [0088] For a motion vector to be predicted, both the associated container image for the motion vector and a reference image of the motion vector can be different. Therefore, the video encoder can calculate a new distance based on POC values, and the video encoder can scale the motion vector based on these two POC distances. For a neighboring spatial candidate, the container images for the two motion vectors are the same, while the reference images are different. In HEVC, motion vector scaling applies to both TMVP and AMVP for neighboring spatial and temporal candidates.
[0089] [0089] In some examples, a video encoder can be configured to determine one or more artificial motion vector candidates. For example, if a list of motion vector candidates is not complete, the video encoder can generate artificial motion vector candidates and insert artificial motion vector candidates at the end of the list until the list includes a predetermined number of entries . In blending mode, there are two types of artificial MV candidates including a combined candidate derived only for slices B and a candidate zero. In some cases, candidate zero is used only for AMVP if the combined type does not provide enough artificial candidates.
[0090] [0090] For each pair of candidates that are already on the candidate list and have necessary motion information, the bidirectional combined motion vector candidates are derived by a combination of the motion vector of the first candidate that refers to an image in the list The e of the motion vector of a second candidate that refers to an image in list 1.
[0091] [0091] According to aspects of this disclosure, motion vector candidates, such as the TMVP shown in Figures 3A and 3B, can be used to derive a motion vector for a block. For example, the video encoder can generate a candidate list that includes a TMVP determined according to the process described above. In this example, the video encoder can use TMVP as an initial motion vector in a process of deriving motion information (for example, bilateral match, model match or the like). the video encoder can apply TMVP in a motion vector derivation process to identify reference data. The video encoder can select TMVP in cases where TMVP identifies strictly corresponding reference data. The video encoder may, in some cases, further refine the TMVP to determine a derived motion vector using the motion information derivation process.
[0092] [0092] In some examples, the video encoder may remove a candidate list that includes motion vector candidates. For example, in some cases, candidates from different blocks may look the same, which decreases the efficiency of an AMVP / merge candidate list. The video code may apply a removal process to resolve this issue. The video encoder can compare one candidate against others in the current candidate list to avoid entering an identical candidate. To reduce complexity, the video encoder can only apply limited numbers of removal processes instead of comparing each potential process with all the others in existence.
[0093] [0093] In some examples, the value of motion vectors is proportional to the distance of images in the presentation time. In such examples, a motion vector can associate two images, the reference image and the image containing the motion vector (that is, the container image). When a motion vector is used to predict the other motion vector, the distance between the container image and the reference image is calculated based on the POC values.
[0094] [0094] For a motion vector to be predicted, both its associated container image and the reference image may be different. Therefore, a new distance (for example, based on POC) is calculated. And the motion vector is scaled based on these two POC distances. For a neighboring spatial candidate, the container images for the two motion vectors are the same, while the reference images are different. In HEVC, motion vector scaling applies to both TMVP and AMVP for neighboring spatial and temporal candidates.
[0095] [0095] If a list of motion vector candidates is not complete, artificial motion vector candidates can be generated and inserted at the end of the list until the list of motion vector candidates has all the candidates. In blending mode, there are two types of artificial MV candidates: combined candidate derived only for slices B and zero candidates used only for AMVP if the first type does not provide enough artificial candidates. For each pair of candidates that are already on the candidate list and have required motion information, the bidirectional combined motion vector candidates can be derived by a combination of the motion vector of the first candidate that refers to an image in list O and motion vector of a second candidate referring to an image in list 1.
[0096] [0096] Candidates from different blocks may look the same, which decreases the efficiency of an AMVP / merge candidate list. A removal process is applied to resolve this issue. The removal process compares one candidate against the others in the current candidate list to avoid entering an identical candidate at a certain point. To reduce complexity, only limited numbers of removal processes are applied instead of comparing each potential process with all others in existence.
[0097] [0097] Figure 4 is a conceptual diagram that illustrates an example of estimation of unilateral movement (ME) in upward conversion of frame rate (FRUC). In particular, Figure 4 illustrates a current frame 100, a reference frame 102 and an interpolated frame 104. In some cases, a video decoder or post-processing device may interpolate images based on one or more reference images. The video decoder or post-processing device can interpolate images to upwardly convert an original frame rate from an encoded bit stream. Alternatively, the video decoder or post-processing device can interpolate images to insert one or more images that have been ignored by a video encoder to encode a video sequence at a reduced frame rate. In any case, the video decoder or post-processing device interpolates frames (such as interpolated frame 104) that are not included in an encoded bit stream that was received by the video decoder using images that were decoded (such as the current table 100 and the reference table 102). The video decoder or post-processing device can interpolate the images using any one of several interpolation techniques, for example, using motion compensated frame interpolation, frame repetition or average frame calculation.
[0098] [0098] The frame interpolation techniques noted above are typically implemented post-circuit. For example, a video decoder typically receives and decodes an encoded bit stream to generate a reconstructed representation of a video sequence that includes the current frame 100 and the reference frame 102. After the decoding circuit, the video decoder or another post-processing device can interpolate images to be included with the reconstructed representation that includes the interpolated frame 104. In some cases, the image interpolation process may be referred to as upward frame rate conversion (FRUC), due to the fact that the resulting image sequence includes additional (interpolated) images that were not included in the encoded bit stream.
[0099] [0099] Consequently, FRUC technology can be used to generate videos with a high frame rate based on videos with a low frame rate. FRUC was used in the exhibition industry. Examples include, for example, X. Chen, J. An, J. Zheng, "EE3: Decoder-Side Motion Vector Refinement Based on Bilateral Template Matching", JVET-EO0052, January 2017, WH Lee, K. Choi, JB Ra, "Frame rate up conversion based on variational image fusion", IEEE transactions on image processing, vol. 23, No. 1, January 2014, and U. S. Kimy M. H. Sunwoo, "New frame rate up- conversion algorithms with low computational complexity", IEEE transactions on circuits and systems for video technology, vol. 24, No. 3, March 2014.
[0100] [0100] FRUC algorithms can be divided into two types. One type of method interpolates intermediate frames by simply repeating or averaging frames. However, this method provides inadequate results for an image that contains a lot of movement. The other type of technique, called FRUC with motion compensation (MC-FRUC), considers the movement of the object when MC-FRUC generates intermediate frames and consists of two steps: motion estimation (ME) and interpolation with motion compensation (MCI) ). ME generates motion vectors (MVs), which represent the movement of the object using vectors, while MCI uses MVs to generate intermediate frames.
[0101] [0101] The block matching algorithm (BMA) is widely used for ME in MC-FRUC, as MC-FRUC is simple to deploy. The BMA divides an image into blocks and detects the movement of those blocks, for example, to determine whether the blocks match. Two types of EM are mainly used for BMA: unilateral EM and bilateral EM.
[0102] [0102] As shown in Figure 4, the unilateral EM obtains MVs by searching for the best match block from the reference frame 102 of the current frame 100. Then, the block on the movement path in the interpolated frame can be located to that the MV is achieved. As shown in Figure 4, three blocks, including 106A, 106B and 106C from the current frame 100, reference frame 102 and interpolated frame 104, respectively, are involved following the movement path. Although block 106A in the current frame 100 belongs to an encoded block, the best matching block 106B in reference frame 102 may not belong entirely to an encoded block and neither can block 106C in the interpolated frame
[0103] [0103] To deal with overlays, simple FRUC algorithms only involve averaging and overwriting the overlapping pixels. In addition, the holes are covered by the pixel values from a reference or current frame. However, these algorithms result in blocking and defocusing artifacts. Therefore, field of motion segmentation, successive extrapolation using the Hartley discrete transform and image retouching are proposed to deal with holes and overlays without increasing blur and blocking artifacts.
[0104] [0104] Figure 5 is a conceptual diagram that illustrates an example of bilateral movement estimation (ME) in FRUC. In particular, Figure 5 illustrates an interpolated block 108 of an interpolated frame 110 that is interpolated from a current block 112 of a current frame 114 and a reference block 116 of a reference frame
[0105] [0105] According to aspects of this disclosure, the bilateral movement estimate shown in the example in Figure 5 can be used to derive movement information. For example, a video encoder (such as video encoder 20 or video decoder 30) can apply bilateral matching as a mode of deriving motion information to derive motion information during encoding. In bilateral correspondence, the video encoder can perform a motion search for a first reference data set on a first reference image that corresponds to a second reference data set on a second reference image.
[0106] [0106] According to other aspects of this disclosure, a video encoder (such as video encoder 20 and / or video decoder 30) can generate the interpolated frame in the encoding circuit or decode using the bilateral correspondence technique shown in Figure 5. For example, the video encoder can use FRUC of image level to interpolate the interpolated image as a predictor of the current image, using the reconstructed pixel matrix. In some instances, such an interpolated image may be considered a reference image or the reconstruction of the current image. In other examples, the video encoder can set the current image equal to the interpolated image. Such an image can be marked as a disposable image and / or non-reference image by syntax elements or decoding processes.
[0107] [0107] Figure 6 is a conceptual diagram that illustrates an example of decoder side motion vector (DMVD) derivation based on model correspondence. With advanced video codecs, the percentage of bits of motion information in the bit stream is increasing. In some cases, DMVD can reduce the bit cost of motion information. Model-based DMVD may exhibit an improvement in coding efficiency, as described, for example, in S. Kamp, M. Wien, "Decoder-side motion vector derivation for block-based video coding", IEEE transactions on circuits and systems for video technology, vol. 22, No. 12, December
[0108] [0108] In the example in Figure 6, a current image 120 includes a prediction target 122 (for example, a block that is currently encoded) and a model 124. Reference images 126 include a colocalized model 128, a better match 130 and a shift vector 132. A video encoder (such as video encoder 20 and / or video decoder 30) can use model 124 to search for the best match for prediction target 122 (for example, instead of using the prediction target 122 itself, which has yet to be coded). For example, the video encoder can perform a motion search to identify a first set of reference data (for example, best match 130) that corresponds to a second set of reference data outside of prediction target 122 (for example, model 124). As noted above, the match can be determined based on an amount of similarity between the reference data and can be termed in this document as determining a "match" or "best match".
[0109] [0109] In the example shown, the video encoder can identify the colocalized model 128 in reference images 126. The video encoder can then search for the best match 130, which includes pixel values that are similar to the model 124. The video encoder video can determine the displacement vector 132 based on the displacement of the colocalized model 128 and the best match 130 in reference images 126.
[0110] [0110] Assuming that the model 124 and the prediction target 122 are of the same object, the motion vector of the model can be used as the motion vector of the prediction target. Therefore, in the example in Figure 8, the video encoder can apply displacement vector 132 to prediction target 122. Since model matching is conducted in both a video encoder and a video decoder, the vector of motion can be derived on the decoder side to avoid signaling costs.
[0111] [0111] According to aspects of this disclosure, the video encoder can apply model matching as a way of deriving motion information. For example, the video encoder can apply model matching to obtain movement information for a current block by finding a better match between model 124 of the current image and the corresponding reference data in reference images 126. Although the example of Figure 6 illustrating model 124 as a reconformed block of video data, it should be understood that other models can be used. For example, the video encoder can use multiple blocks as a template, for example, one or more blocks positioned to the left of prediction target 122 and one or more blocks positioned above prediction target 122.
[0112] [0112] According to aspects of this disclosure, the video encoder can apply the model matching techniques shown in Figure 6 with the use of one or more motion vectors from a list of motion vector candidates. For example, the video encoder can be configured to determine one or more candidate motion vectors using any combination of techniques described in this document (for example, blending mode candidates, AMVP candidates, a TMVP or the like) . The video encoder can then be configured to apply one or more of the candidate motion vectors to model 124 to locate the colocalized model 128 (in this example, the location of the colocalized model 128 is dictated by one or more candidate motion vectors and not necessarily strictly colocalized). The video encoder can be configured to determine which of the candidate motion vectors results in a better match between model 124 and colocalized model 128.
[0113] [0113] According to aspects of this disclosure, theThe video encoder can then be configured to refine the candidate motion vector to derive motion information for the prediction target 122. For example, the video encoder can search for a better correspondence to model 124 in a region of reference images 126 identified by the candidate motion vector. By determining a better match, the video encoder can determine an offset between the model 124 and the determined based match. The video encoder may designate the displacement as a derived motion vector for the prediction target 122.
[0114] [0114] Figure 7 is a conceptual diagram that illustrates an example of bidirectional motion vector derivation in DMVD. Another category of DMVD is the derivation of mirror-based bidirectional MV, as described, for example, in Y.-J. Chiu, L. Xu, W. Zhang, H. Jiang, "Decoder-side Motion Estimation and Wiener filter for HEVC", Visual communications and Image Processing (VCIP),
[0115] [0115] The example in Figure 7 includes the current image 140 that has the current block 142 (the block that is currently encoded), the first candidate motion vector PMVO that identifies a first model 144 block from a first reference image 146 (LO ref) and a second candidate motion vector PMV1 that identifies a second model block 148 of a second reference image 150. The video encoder can apply dMV as a offset to locate a first reference block 152 in the window search 154 from the first reference image 146 and to locate a second reference block 156 in the search window 158 of the second reference image 150.
[0116] [0116] For example, the video encoder can add dMV to PMVO and subtract dMV from PMV1 to generate a pair of MV, MVO and MVl. The video encoder can check all dMV values within search window 154 and 158 to determine which dMV value results in the best match between the first reference block 152 (for example, a first reference data set) of LO ref and the second reference block 156 (for example, a second reference data set) of Ll1 ref. In some examples, the video encoder can determine the best match based on the Sum of Absolute Difference (SAD). In other examples, the video encoder can use another metric to determine the best match. The size and location of search windows 154 and 158 can be predefined or can be flagged in a bit stream.
[0117] [0117] The video encoder can select the MV pair with the minimum SAD as the output of the central symmetric motion estimation. Since this technique uses a future reference (reference in a time position after the current frame) and a previous reference (reference in a time position prior to the current frame) for SAD correspondence, the selection of the MV pair with SAD minimum cannot be applied to frame P or low delay B frames, where only the previous reference is available.
[0118] [0118] In accordance with aspects of this disclosure, the video encoder can apply bidirectional motion vector bidirectional techniques as a way of deriving motion information. In some examples, the video encoder can apply the techniques shown in Figure 7 using one or more motion vectors from a list of motion vector candidates. For example, the video encoder can be configured to determine one or more candidate motion vectors using any combination of techniques described in this document (for example, blending mode candidates, AMVP candidates, a TMVP or the like) . The video encoder can then be configured to apply one or more of the candidate motion vectors such as PMVO and / or PMV1 to locate the first model block 144 and the second model block 148. The video encoder can be configured to determine which of the candidate motion vectors results in a better match between the first model block 144 and the second model block 148.
[0119] [0119] According to aspects of this disclosure, theThe video encoder can be configured to refine the candidate motion vector to derive motion information for the current block 142. For example, the video encoder can search for a better match by applying of a variety of dMV values, as described above. In this way, the video encoder can derive the pair of MV, MVO and MVl.
[0120] [0120] Figure 8A is a conceptual diagram that illustrates derivation of motion vector based on extended bilateral correspondence. A potential disadvantage of mirror-based bidirectional VM derivation (for example, as shown in Figure 7) is that mirror-based bidirectional VM derivation does not work when two references in the current image are before or after the current image. The extended bilateral correspondence techniques described in this document can, in some cases, overcome the disadvantage that all reference images in the current image are on the same side (in order of display) as the current image.
[0121] [0121] The example in Figure 8A includes a current image 160 that includes a current block 162, a first reference image (RefO0) 164 that includes a first reference block 166 and a second reference image (Refl) 168 that includes a second reference block 170. As shown in Figure 8A, the first reference image (Ref0O) 164 and the second reference image (Refl) 168 are both located before the current image in the temporal direction.
[0122] [0122] The video encoder can select the final pair MVO and MVl as the pair that minimizes the cost of matching between the pair of blocks pointed by MVO and MV1. Theoretically, the current block 162 can be considered as an extrapolated block based on the first reference block 166 and the second reference block 170. It should be noted that the extended bilateral correspondence also works in a bidirectional case in which the current frame is temporally between the two references. In this case, the current block 162 can be considered as an interpolated block based on the first reference block 166 and the second reference block 170. In addition, the bilateral correspondence techniques described in this document do not require a “mirrored relationship” between MVO and MVl, even in the bidirectional case. The assumption of bilateral correspondence is that the ratio between MVO and MVl is proportional to the ratio between the temporal distance from RefO to the current image and that from ReflO to the current image.
[0123] [0123] Clearly, for reference blocks other than the first reference block 166 and the second reference block 170, the video encoder can derive a different MV pair. In one example, the video decoder can select reference images to perform bilateral matching, according to an order in which the reference images appear in a list of reference images. For example, the video encoder can select the first reference in reference list O as RefO and the first reference in reference list 1 as Refl. The video encoder can then search for the MV pair (MVO, MV1). In another example, the video encoder selects RefO based on an entry in an initial list (for example, an initial list of motion vector candidates). The video encoder can then set Refl to a reference image in the other reference image list that is temporarily closer to the current image. Consequently, the video encoder can search for the MV pair (MVO, MV1) in RefO and Refl.
[0124] [0124] Accordingly, according to aspects of this disclosure, the video encoder can apply the extended bi-directional motion derivation techniques illustrated in Figure 8A as a mode of derivation of motion information. For example, the video encoder can use bilateral matching to derive motion information from the current block 162, finding the best match between two blocks (for example, as the first reference block 166 and the second reference block 170) along of the movement path of the current block in two different reference images. Under the assumption of continuous motion trajectory, the motion vectors MVO and MVl that point to the two reference blocks, first reference block 166 and second reference block 170, must be proportional to the temporal distances, that is, TDO and TDl , between the current image and the two reference images. As a special case, when the current image 160 is temporarily between two reference images (as shown in the example in Figure 7) and the time distance from the current image to the two reference images is the same, bilateral correspondence becomes Two-way mirror-based MV.
[0125] [0125] Figure 8B is a flowchart that illustrates an example of decoding a prediction unit (PU) using DMVD. In Y.-J. Chiu, L. Xu, W. Zhang, H. Jiang, "Decoder-side Motion Estimation and Wiener filter for HEVC '", Visual communications and Image Processing (VCIP), 2013, it was further proposed to combine bidirectional MV derivation with the base mirror with HEVC blending mode. In the proposed technique, a flag called pu dmvd flag is added for a sliced PU to indicate whether a DMVD mode is applied to the current PU. Due to the fact that the DMVD mode does not explicitly transmit any MV information in the bit stream, the pu dmvd flag syntax element is integrated into the HEVC blending mode syntax (which uses an index for data representative of a motion vector. instead of the motion vector itself).
[0126] [0126] In the example in Figure 8B, a video decoder (such as video decoder 30) can begin to decode a PU (180). The video decoder 30 can determine whether the mode used to decode the PU is the blending mode (182), for example, based on the syntax included in a bit stream that includes the PU. If blending mode is not used (the “no” branch of step 182), video decoder 30 can use a regular process for a non-blending PU to decode the PU (184) and complete the process (186).
[0127] [0127] If blending mode is used (the “yes” branch of step 182), video decoder 30 can determine whether DMVD is used to determine motion information for the PU based on the value of the pu syntax element dmvd flag (188). If DMVD is not used (the "no" branch of step 188), video decoder 30 can use a regular blending mode to decode the PU (190) and complete the process (186). If DMVD is used (the "yes" branch of step 188), video decoder 30 can apply a DMVD process to determine the motion information of the PU (192) and complete the process (186).
[0128] [0128] To find a block's motion vector, fast motion search methods are used in many practical video codecs. There are many methods of fast-motion research proposed in the literature, such as block-based gradient gradient search (BBGDS), as described in Lurng-Kuo Liu, Ephraim Feig, "A block- based gradient descent search algorithm for block motion estimation in video coding ", IEEE Trans. Syst Circuits. Video Technol. , vol. 6, pages 419 to 422, August 1996, Unrestricted Center-Biased Diamond Search (UCBDS), as described in Jo Yew Tham, Surendra Ranganath, Maitreya Ranganath and Ashraf Ali Kassim, " A novel unrestricted center-biased diamond search algorithm for block motion estimation ", IEEE Trans. Syst Circuits. Video Technol., Vol. 8, pages 369 to 377, August 1998, HEXagon-based search (HEBS), as described in Ce Zhu, Xiao Lin and Lap-Pui Chau, "Hexagon-Based Search Pattern for Fast Motion Estimation block," IEEE Trans. Syst Circuits. Video Technol., Vol. 12, pages 349 to 355, May 2002. Basically, these techniques search only a certain number of positions within a search window based on predefined search patterns. These techniques usually work well when the movement is small and moderate.
[0129] [0129] Figure 9 is a conceptual diagram that illustrates an example of bilateral correspondence. In U.S. Patent Application Publication No. 2016/0286229, an encoding method was proposed based on the upward frame rate conversion method, for example, FRUC mode. In general, the FRUC mode is a special blending mode, with which the movement information of a block is not signaled, but derived on the decoder side.
[0130] [0130] Video encoder 20 can signal a FRUC flag to a CU when its merge flag is true. When the FRUC flag is false, video encoder 20 can flag a merge index and use regular merge mode. When the FRUC flag is true, video encoder 20 can flag an additional FRUC mode flag to indicate which method (bilateral match or model match) should be used to derive motion information for the block.
[0131] [0131] During the motion derivation process, the video encoder 20 and / or the video decoder 30 can derive an initial motion vector (e.g., original motion vector, initial motion vector information, etc.). ) for the entire CU based on bilateral correspondence or model correspondence. In this example, video encoder 20 and / or video decoder 30 can check the CU merge list and select the candidate that leads to the minimum match cost as the starting point. In this example, video encoder 20 and / or video decoder 30 performs a local search based on bilateral correspondence or model correspondence around the starting point and takes the MV that results in the minimum correspondence cost as the MV for all CU. Subsequently, video encoder 20 and / or video decoder 30 can further refine motion information at a sub-block level with the derived CU motion vectors as the starting points.
[0132] [0132] In the example in Figure 9, video encoder 20 and / or video decoder 30 can use bilateral correspondence to obtain movement information for the current block 201, finding the best correspondence between two blocks along the movement path of the current block in two different reference images. For example, video encoder 20 and / or video decoder 30 can find the best match between a first RefO input reference block 202 and a second input reference block 204 along the current block's motion path. 201.
[0133] [0133] Under the assumption of the continuous motion path, the motion vectors MVO 206 and MVl 208 that point to the first input reference block 202 and the second input reference block 204, respectively, must be proportional to the temporal distances , that is, TDO 210 and TD1 212, between the current image 200 and the first input reference block 202 and the second input reference block 204. As a special case, when the current image 200 is temporally between the two images of reference and the temporal distance from the current image to the first input reference block 202 and the second input reference block 204 is the same, bilateral correspondence becomes two-way mirror-based MV.
[0134] [0134] Figure 10 is a conceptual diagram that illustrates an example of model matching. In the example in Figure 10, video encoder 20 and / or video decoder 30 can use model matching to derive motion information from the current block 220 by finding the best match between a model (for example, upper neighbor block 222 and / or left neighbor block 224 of the current block 220) in current image 220 and a block (same size as the model) in a reference image 230.
[0135] [0135] On the encoder side, the video encoder 20 can make a decision on whether to use FRUC blending mode for a CU based on RD cost selection as done for normal blending candidate. That is, the video encoder 20 can check the two matching modes (for example, bilateral matching and model matching) for a CU using the RD cost selection. The video encoder 20 can compare the one that leads to the minimum cost (for example, bilateral correspondence and model correspondence) with other CU modes. If a FRUC match mode is the most effective, video encoder 20 can set a FRUC flag to true for CU and use the related match mode.
[0136] [0136] Figure 11 is a conceptual diagram that illustrates neighboring samples used to derive CI parameters. Local lighting compensation (LIC) is based on a linear model for lighting changes, using a scaling factor a and an offset b. And the LIC can be enabled or disabled in an adaptable way for each coding unit (CU) coded between modes.
[0137] [0137] When the LIC is applied to a CU, a minimal quadratic error method is employed to derive parameters a and b using neighboring samples 240 of the current CU and its corresponding reference samples 242. More specifically, as shown in Figure 11, the neighboring samples subsampled (subsampling 2: 1) 240 of the CU and the corresponding pixels (identified by current CU or sub-CU motion information) in the reference image are used. The CI parameters are derived and applied to each prediction direction separately.
[0138] [0138] When a CU is encoded with blending mode, the LIC flag is copied from neighboring blocks, in a similar way to copying motion information in blending mode; otherwise, a LIC flag is signaled to the CU to indicate whether LIC applies or not.
[0139] [0139] Figure 12 is a conceptual diagram that illustrates an example of movement derivation on the decoder side based on bilateral model correspondence. In Chen, J. An, J. Zheng, "EE3: Decoder-Side Motion Vector Refinement Based on Bilateral Template Matching", JVET-EO0O52, January 2017, a method of derivating motion on the decoder side based on a bilateral model match. the video encoder 20 and / or the video decoder 30 can generate a bilateral model 350 as a weighted combination of the two prediction blocks, from the initial MVo of listO and MV, of listl, respectively, as shown in Figure
[0140] [0140] The model matching operation may include calculating cost measurements between the generated model Tn = (Ro, ntRi, n) / 2 and the sample region (around the initial prediction block) in the reference image. For each of the two reference images, the MV that yields the minimum model cost is considered as the updated MV of this list to replace the original one, that is, MV, '= arg min (R'o, n - Tr) (1) MV, '= arg min (R'1, n - Tn) (2)
[0141] [0141] The video encoder 20 and / or video decoder 30 can use the two new MVs, for example, MV9 '360 and MV,' 362 as shown in Figure 12, for regular bi-prediction. In some instances, video encoder 20 and / or video decoder 30 may use the Sum of Absolute Differences (SAD) as a cost measurement.
[0142] [0142] video encoder 20 and / or video decoder 30 can apply DMVD for bi-prediction blending mode with one of the reference image in the past and the other of the reference image in the future, without the transmission of an additional syntax element. In JEM4.0, when LIC, like, sub-CU or FRUC merge candidate is selected for a CU, the technique is not applied.
[0143] [0143] The multi-propagation nature of FRUC can potentially use an increased amount of reference samples from external memory to perform the search. For example, the video encoder 20 and / or the video decoder 30 that adds bi-predicted motion vector information to the candidate list of motion vector information matching the uni-predicted motion vector can increase an amount reference samples. In some cases, all propagation motion vectors (for example, initial motion vector information in a candidate list of motion vector information) may fall in non-contiguous regions in a frame of reference, and therefore the video encoder 20 and / or video decoder 30 can search all reference samples to perform the FRUC search to find the best motion vector. This potentially increased amount of reference samples can increase the chance of cache failures and, therefore, can result in a higher latency problem in some deployments.
[0144] [0144] This disclosure describes techniques that potentially address the following complexity issues in the existing FRUC project. In a first example, in existing FRUC research, a video encoder can derive a set of propagation motion vectors (for example, initial motion vector information in a candidate list of motion vector information) and search in its surrounding area. This can potentially increase the bandwidth requirement in the worst case scenario. In a second example, bilateral model matching introduces an alternative way of motion refinement to regular blending mode and brings coding efficiency, while the scheme requires additional temporary storage for the bilateral model for motion refinement, which is inconsistent with other methods of movement refinement and incurs additional complexity. In a third example, in the existing FRUC project, a motion search on the decoder side is followed by a sub-block refinement, where each sub-block (for example, 4x4 sub-block) can have motion vectors of different propagation points pointing to a non-contiguous region of the frame of reference. The non-contiguous search range covered by each of the motion propagation vectors can increase the bandwidth requirement as well as the computational complexity while obtaining 0.4% to 1.1% of coding gain.
[0145] [0145] To address the above, several techniques are proposed below.
[0146] [0146] The following discriminated techniques can be applied individually. Alternatively, any combination of these techniques can be applied. Note that the reference index information can be considered as a part of the movement information, sometimes the reference index information and the movement information are together referred to in this document as a set of movement information.
[0147] [0147] In a first technique, for FRUC model matching, or bilateral matching, or both, video encoder 20 builds a list of propagation motion vectors, and the initial MV (propagation) is signaled instead of being derivative. Differently, for example, video encoder 20 and / or video decoder 30 can construct a candidate list of motion vector information for a portion of a current frame. Video encoder 20 and / or video decoder 30 can only search around the initial MV. The portion of the current frame can correspond to a current block of the current frame, a current encoding unit for the current frame, or a plurality of encoding units for the current frame.
[0148] [0148] Video encoder 20 can signal the initial MV to video decoder 30. For example, video encoder 20 can signal the initial MV (propagation) at a block level. In a different way, for example, video encoder 20 can output an indication of residual sample values and signaling information that indicate the initial motion vector information from the motion vector information candidate list. In some examples, video decoder 30 may receive signaling information indicating initial motion vector information from the candidate list of motion vector information, initial motion vector information indicating an initial position in a frame of reference . In some instances, video encoder 20 may signal the initial MV for each encoding unit. In some instances, video encoder 20 may signal the initial MV at a higher level (for example, a larger region that covers multiple encoding units). In this example, for each encoding unit within a region, the video decoder 30 can search for a small strip around the signaled MV. In some examples, Video encoder 20 may signal an index or a flag to indicate the initial MV (propagation), from the constructed list of propagation MVs.
[0149] [0149] For the construction of the initial candidate list of propagation motion vectors, the video encoder 20 and / or the video decoder 30 can apply a removal process. For example, removal can be based on the current block size and the accuracy of the motion vectors to be used. In a different way, for example, video encoder 20 and / or video decoder 30 can remove first candidate motion vector information from an initial candidate list of motion vector information based on a block size current and / or motion vector precision for refined motion vector information to generate the motion vector information candidate list. In some examples, to remove, video encoder 20 and / or video decoder 30 may: (1) remove a merge candidate from a candidate list of motion vector information; or (2) omit the refinement of the merge candidate. To remove, the video encoder and / or video decoder 30 can determine a motion vector accuracy (e.g., pixel accuracy) for refined motion vector information.
[0150] [0150] In some examples, video encoder 20 and / or video decoder 30 may remove based on the similarity of motion vectors in the list. In a different way, for example, video encoder 20 and / or video decoder 30 can remove first candidate motion vector information from an initial candidate list of motion vector information based on a similarity between the first candidate motion vector information and a second candidate motion vector information from the initial motion vector information candidate list to generate the motion vector information candidate list. In some examples, to remove, video encoder 20 and / or video decoder 30 may determine motion vector accuracy for second motion vector information candidates from the motion vector information candidate list based on a similarity between the first candidate motion vector information and the second candidate motion vector information.
[0151] [0151] The similarity can be based on the distance between motion vectors. In some examples, video encoder 20 and / or video decoder 30 may use the following rule equation:
[0152] [0152] During the derivation of FRUC TM propagation motion vectors, the video encoder 20 and / or the video decoder 30 can use a un prediction technique for bi prediction. In slices B, if any of the derived candidates is predicted from LO or L1 only, the video encoder 20 and / or the video decoder 30 can artificially create a motion vector paired with the opposite sign as the motion vector of the another list, and add the candidate to the list of candidates with bi-predicted motion vectors. In a different way, for example, the video encoder 20 and / or the video decoder 30 may, in response to the determination that the current frame portion corresponds to a B slice and uni-predicted motion vector information must be included in the motion vector information candidate list, add bi-predicted motion vector information to the motion vector information candidate list that corresponds to the uni-predicted motion vector. For example, video encoder 20 and / or video decoder 30 can generate bi-predicted motion vector information to indicate the first motion vector (for example, predicted from LO or Ll only) and a second vector of motion that corresponds to the first motion vector with an opposite sign. In some examples, the video encoder and / or video decoder 30 may denote the LO motion vector as MVO and the motion vector from L1 is unavailable, and may define an artificial motion vector from 11 'to -MVO with the reference index set to 0, and vice versa.
[0153] [0153] The video encoder 20 and / or video decoder 30 can create the unavailable motion vector based on the relative time distance for the current frame. For example, video encoder 20 and / or video decoder 30 may denote the LO motion vector as MVO and a time distance from the LO motion vector to the current frame as POCO and the time distance from the reference frame in Ll, reference index O for the current table is POCl. The artificial motion vector for L1 can be written as:
[0154] [0154] Instead of always using reference index O for unavailable reference list (ListO / Listl), video encoder 20 and / or video decoder 30 can select the index value based on the QP values images in the unavailable list. The video encoder 20 and / or the video decoder 30 can use the image associated with the lowest average QP values as the reference index. Alternatively or additionally, the video encoder 20 and / or the video decoder 30 can select the index value with the difference of lesser POC or the index of lesser time layer. Alternatively or additionally, the video encoder 20 can signal the reference index in a slice header, PPS, SPS or block level.
[0155] [0155] Video encoder 20 can determine the number of candidates to be signaled at the slice level. Alternatively or additionally, signaling the number of candidates may be dependent on mode. For example, the signaling of HF and non-HF cases can be different. This includes, but is not limited to, the number of FRUC TM propagation candidates when IC is enabled is 2, and the number of FRUC TM candidates in the non-IC case is 4.
[0156] [0156] In a second technique, the video encoder 20 and / or the video decoder 30 can use bilateral FRUC matching to perform the movement refinement made by bilateral model matching. That is, for example, video encoder 20 and / or video decoder 30 can refine, based on one or more of the bilateral match or model match, the initial motion vector information to determine vector information refined motion patterns that indicate a refined position in the frame of reference that is within a search range from the starting position. More specifically, for example, video encoder 20 and / or video decoder 30 can refine the motion path based on a difference in correspondence between the first start position and the second start position. The video encoder 20 and / or the video decoder 30 can move the original bilateral FRUC correspondence from a separate FRUC mode to the motion vector refinement of the regular blending mode.
[0157] [0157] Instead of creating the bilateral model and performing the movement refinement as described in bilateral model correspondence, for example, X. Chen, JJ. An, J. Zheng, "EE3: Decoder-Side Motion Vector Refinement Based on Bilateral Template Matching", JVET-EO0052, January 2017, video encoder 20 and / or video decoder can use bilateral correspondence as described in US patent publication No. US-2016- 0286230. Note that according to the scheme illustrated in Figure 9, video encoder 20 and / or video decoder 30 can perform the search between a region in RefO0O and a region in Refl. The search range of the motion refinement can be set to 8, while the search range can be signaled using a higher level syntax element. Video encoder 20 and / or video decoder 30 can use full pixel searches that are performed iteratively until no further updates are reached or reach the search range limit, followed by half-pixel searches using the same interruption criterion.
[0158] [0158] The video encoder 20 and / or the video decoder 30 can perform refinement on a bilateral basis in a mirrored mode. That is, for example, during the search for the refinement motion vectors, the video encoder 20 and / or the video decoder 30 may employ an opposite motion vector refinement of the opposite signal to perform the search. In a different way, for example, video encoder 20 and / or video decoder 30 can modify a first motion vector of the motion vector path that specifies the first starting position by a motion vector refinement and modify a second motion vector of the motion vector path that specifies the second starting position by refining the motion vector with an opposite sign.
[0159] [0159] The video encoder 20 and / or video decoder 30 can define the two regions in a mirrored mode that includes the temporal distance. That is, for example, the video encoder 20 and / or the video decoder 30 can consider the time distance between the Ref0O, Refl and the current frame and the video encoder 20 and / or the video decoder 30 can perform the scaling accordingly to obtain the motion vectors for both RefO and Refl (for example, similar to Equation (4)). In a different way, for example, the video encoder and / or video decoder 30 can scale the movement path based on a time distance between the current frame and the first reference frame and a time distance between the current frame and the second frame of reference.
[0160] [0160] The video encoder 20 and / or video decoder 30 can search the two regions separately, without imposing a mirrored restriction. Initially, video encoder 20 and / or video decoder 30 can set MVO0 and search for MV1 and then video encoder 20 and / or video decoder 30 can set best MVl1 and search for MVO and so on. This process can continue until there are no changes in both MVO and MVl. In a different way, for example, video encoder 20 and / or video decoder 30 can refine a first motion vector of the motion vector path that specifies the first start position based on the difference in correspondence between the first start position and the second start position to generate a first refined motion vector and refine a second motion vector from the motion vector path that specifies the second start position based on the first refined motion vector.
[0161] [0161] The video encoder 20 and / or video decoder 30 can use a metric to perform the search for motion vector refinement such as, but without limitation, the Absolute Difference Sum (SAD), SAD with mean removed (MR-SAD), Sum of Square of Differences (SSD), Normalized Cross Correlation (NCC) or The Structural Similarity Index (SSEVI). Differently, for example, video encoder 20 and / or video decoder 30 can determine the difference in correspondence between the first start position and the second start position based on a metric, where the metric comprises one or more among a SAD, a MR-SAD, SSD, NCC or an SSIM.
[0162] [0162] The video encoder 20 and / or video decoder 30 can use a metric based on the block size. Differently, for example, the video encoder 20 and / or the video decoder 30 can select the metric from a plurality of metrics based on a current block size. For large blocks, for example, video encoder 20 and / or video decoder 30 can use MR-SAD, NCC or SSIM. Differently, for example, video encoder 20 and / or video decoder 30 can select the metric as
[0163] [0163] In a third technique, for FRUC model matching, video encoder 20 and / or video decoder 30 can selectively disable sub-block motion refinement to reduce the extra propagations introduced by motion search sub-block. For example, video encoder 20 and / or video decoder 30 can add a slice level switch to determine whether subblock motion refinement is enabled. Video encoder 20 can make such a decision based on the statistics in the previous tables. For example, if the average block size of the previous frame is greater than a threshold, the video encoder 20 can enable sub-block movement refinement. In some examples, if the average block size of the previous frame is not greater than a threshold, video encoder 20 can disable sub-block movement refinement. In some example, video encoder 20 may disable sub-block motion refinement completely.
[0164] [0164] Video encoder 20 can partially disable subblock motion refinement. For example, for the sub-blocks closest to the upper left positions, video encoder 20 can disable sub-block movement refinement, while for those closest to the lower right positions, video encoder 20 can enable refinement. of sub-block movement.
[0165] [0165] The techniques mentioned above can be applied to certain block sizes and / or encoding modes.
[0166] [0166] Figure 13 is a block diagram illustrating an example video encoder 20 that can implement the techniques of this disclosure. Figure 13 is provided for explanatory purposes and should not be considered limiting to the techniques, as amply exemplified and described in this disclosure. The techniques of this disclosure may be applicable to different standards or coding methods.
[0167] [0167] In the example of Figure 13, video encoder 20 includes a prediction processing unit 400, video data memory 401, a residual generation unit 402, a transform processing unit 404, a quantization unit 406 , a reverse quantization unit 408, a reverse transform processing unit 410, a reconstruction unit 412, a filter unit 414, a decoded image buffer 416 and an entropy encoding unit 418. The prediction 400 includes an inter-prediction processing unit 420 and an intra-prediction processing unit 426. inter-prediction processing unit 420 may include a motion estimation unit and a motion compensation unit (not shown) . The video encoder 20 can be configured to perform one or more techniques described in this document for deploying FRUC.
[0168] [0168] The 401 video data memory can be configured to store video data to be encoded by the video encoder 20 components. The video data stored in the 401 video data memory can be obtained, for example, from of the video source 18. The decoded image buffer 416 may be a reference image memory that stores reference video data for use in encoding video data by video encoder 20, for example, in intra or inter-coding. The video data memory 401 and the decoded image buffer 416 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), including synchronous DRAM (SDRAM), magneto-RAM resistive (MRAM), resistive RAM (RRAM) or other types of memory devices. The video data memory 401 and the decoded image buffer 416 may be provided by the same memory device or by separate memory devices. In several examples, the video data memory 401 may be on-chip with other video encoder components 20 or off-chip with respect to those components. The video data memory 401 can be the same or part of the storage media 19 of Figure 1.
[0169] [0169] Video encoder 20 receives video data. The video encoder 20 can encode each CTU into a slice of an image of the video data. Each of the CTUs can be associated with luma coding tree blocks of equal size (CTBs) and corresponding CTBs of the image. As part of encoding a CTU, the prediction processing unit 400 can partition to divide the CTU CTBs into progressively smaller blocks. The smallest blocks can be CU coding blocks. For example, the prediction processing unit 400 can partition a CTB associated with a CTU according to a tree structure.
[0170] [0170] Video encoder 20 can encode CUs from a CTU to generate encoded representations of the CUs (i.e., encoded CUs). As part of the encoding of a CU, the prediction processing unit 400 can partition the encoding blocks associated with the CU between one or more PUs of the CU. In this way, each PU can be associated with a luma prediction block and corresponding chroma prediction blocks. The video encoder 20 and video decoder 30 can support PUs that have different sizes As indicated above, the size of a CU can refer to the size of the CU's luma encoding block and the size of a PU can refer to the size of a PU luma prediction block. Assuming that the size of a particular CU is 2Nx2N, video encoder 20 and video decoder 30 can support PU sizes of 2Nx2N or NxN for intra-prediction and symmetric PU sizes of 2Nx2N, 2NxN, Nx2N, NXN or similar for inter-prediction. The video encoder 20 and video decoder 30 can also support the asymmetric partition for PU sizes of 2NxnU, 2NxnD, nLx2N and nRx2N for inter-prediction.
[0171] [0171] The inter-prediction processing unit 420 can generate predictive data for a PU by performing inter-prediction on each PU in a CU. Predictive data for PU can include predictive blocks for PU and motion information for PU. The inter-prediction processing unit 420 can perform different operations for a PU from a CU depending on whether the PU is in a I slice, a P slice or a B slice. In an IL slice, all PUs are intra-predicted . Therefore, if the PU is in a slice I, the inter-prediction processing unit 420 does not perform the inter-prediction in the PU. In this way,
[0172] [0172] The intra-prediction processing unit 426 can generate predictive data for a PU by performing intra-prediction on the PU. Predictive data for PU can include PU predictive blocks and various syntax elements. The intra-prediction processing unit 426 can perform intra-prediction on PUs in slices I, slices P and slices B.
[0173] [0173] To perform intra-prediction on a PU, the intra-prediction processing unit 426 can use multiple intra-prediction modes to generate multiple predictive data sets for the PU. The intra-prediction processing unit 426 can use samples from sample blocks of neighboring PUs to generate a predictive block for a PU. Neighboring PUs can be above, above and to the right, above and to the left or left of the PU, assuming an encoding order from left to right and top to bottom for PUs, CUs and CTUs. The intra-prediction processing unit 426 can use several numbers of intra-prediction modes, for example, 33 directional intra-prediction modes. In some instances, the number of intra-prediction modes may depend on the size of the PU-associated region.
[0174] [0174] The prediction processing unit 400 can select the predictive data for PUs of a CU from the predictive data generated by the inter-prediction processing unit 420 for the PUs or the predictive data generated by the intra-prediction processing unit 426 for the Pus. In some examples, the prediction processing unit 400 selects predictive data for CU PUs based on rate / distortion metrics from the predictive data sets. The predictive blocks of the selected predictive data can be referred to in this document as the selected predictive blocks. The prediction processing unit 400 can be configured to perform one or more techniques described in this document to determine initial motion vector information from the candidate list of motion vector information for signaling.
[0175] [0175] Residual generation unit 402 can generate, based on the coding blocks (for example, luma coding blocks, Cb and Cr) for a CU and the selected predictive blocks (for example, luma predictive blocks, Cb and Cr) for CU PUs, residual blocks (for example, residual luma blocks, Cb and Cr) for CU. For example, the residual generation unit 402 can generate the CU residual blocks, so that each sample in the residual blocks has a value equal to a difference between a sample in a CU coding block and a corresponding sample in a predictive block corresponding selected from a CU PU.
[0176] [0176] The transform processing unit 404 can perform the quadtree partition to partition the residual blocks associated with a CU into transform blocks associated with CU's TUs. In this way, a TU can be associated with a luma transform block and two chroma transform blocks. The sizes and positions of the CU luma and chroma transform blocks of a CU may or may not be based on the sizes and positions of the CU PU prediction blocks. A quadtree structure known as a “residual quadtree” (RQOT) can include nodes associated with each of the regions. The CU's TUs can correspond to the leaf nodes of the ROT.
[0177] [0177] The transform processing unit 404 can generate transform coefficient blocks for each TU of a CU, by applying one or more transforms to the TU transform blocks. The transform processing unit 404 can apply several transforms to a transform block associated with a TU. For example, the transform processing unit 404 can apply a discrete cosine transform (DCT), a directional transform or a transform conceptually similar to a transform block. In some examples, the transform processing unit 404 does not apply transforms to a transform block. In such examples, the transform block can be treated as a transform coefficient block.
[0178] [0178] The quantification unit 406 can quantify the transform coefficients in a coefficient block. The quantization process can reduce the bit depth associated with some or all of the transform coefficients. For example, a n-bit transform coefficient can be rounded down to a m-bit transform coefficient during quantization, where n is greater than m. The quantification unit 406 can quantify a coefficient block associated with a CU's TU based on a quantization parameter (QP) value associated with the CU. The video encoder 20 can adjust the degree of quantification applied to the coefficient blocks associated with a CU by adjusting the QP value associated with the CU. Quantification can introduce loss of information. Thus, the quantized transform coefficients may be less accurate than the original ones.
[0179] [0179] The inverse quantization unit 408 and the inverse transform processing unit 410 can apply inverse quantization and inverse transformations to a coefficient block, respectively, to reconstruct a residual block from the coefficient block. The reconstruction unit 412 can add the reconstructed residual block to corresponding samples from one or more predictive blocks generated by the prediction processing unit 400 to produce a reconstructed transform block associated with a TU. By reconstructing transform blocks for each CU of a CU in this way, the video encoder 20 can reconstruct the CU encoding blocks.
[0180] [0180] The filter unit 414 can perform one or more unlocking operations to reduce blocking artifacts in the coding blocks associated with a CU. Temporary storage of decoded image 416 can store the reconstructed coding blocks after the filter unit 414 performs one or more unlock operations on the reconstructed coding blocks. The inter-prediction processing unit 420 can use a reference image containing the reconstructed coding blocks to perform the inter-prediction on PUs of other images. In addition, the intra-prediction processing unit 426 can use reconstructed encoding blocks in the decoded image buffer 416 to perform the intra-prediction on other PUs in the same image as the CU.
[0181] [0181] Entropy coding unit 418 can receive data from other functional components of video encoder 20. For example, entropy coding unit 418 can receive coefficient blocks from quantization unit 406 and can receive elements of syntax from the prediction processing unit 400. The entropy coding unit 418 can perform one or more entropy coding operations on the data to generate entropy encoded data. For example, the entropy coding unit 418 can perform a CABAC operation, a context-adaptive variable-length coding operation (CAVLC), a variable-length to variable coding operation (V2V), an adaptive binary arithmetic coding operation to the syntax-based context (SBAC), a probability interval partition entropy coding operation (PIPE), an Exponential-Golomb coding operation, or another type of entropy coding operation on the data. The video encoder 20 can output a bit stream that includes entropy-encoded data generated by the entropy encoding unit 418. For example, the bit stream can include data representing values of transform coefficients for a CU.
[0182] [0182] Figure 14 is a block diagram illustrating an example video decoder 30 that is configured to implement the techniques of this disclosure. Figure 14 is provided for explanatory purposes and is not limiting the techniques “as widely exemplified and described in this disclosure. For purposes of explanation, this disclosure describes the video decoder 30 in the context of HEVC encoding. However, the techniques of this disclosure may be applicable to other standards or coding methods.
[0183] [0183] In the example in Figure 14, video decoder 30 includes an entropy decoding unit 450, video data memory 451, a prediction processing unit 452, an inverse quantizing unit 454, a processing unit of reverse transform 456, a reconstruction unit 458, a filter unit 460 and a decoded image buffer 462. The prediction processing unit 452 includes a motion compensation unit 464 and an intra prediction processing unit 466. In other examples, Video decoder 30 may include more functional components, less functional components or different functional components. The video decoder 30 can be configured to perform one or more techniques described in this document for deploying FRUC.
[0184] [0184] The 451 video data memory can store encoded video data, such as an encoded video bit stream, to be decoded by the video decoder components 30. The video data stored in the 451 video data memory can be obtained, for example, from computer-readable media 16, for example, from a local video source, such as a camera, through wired or wireless network communication of video data or by accessing storage media. physical data. The video data memory 451 can form an encoded image temporary (CPB) store that stores encoded video data from an encoded video bit stream. The decoded image buffer 462 may be a reference image memory that stores reference video data for use in decoding video data by the video decoder 30, for example, in intra or inter encoding modes, or for output. The video data memory 451 and the decoded image buffer 462 can be formed by any of a variety of memory devices, such as DRAM, including SDRAM, MRAM, RRAM or other types of memory devices. The video data memory 451 and the decoded image buffer 462 can be provided by the same memory device or by separate memory devices. In several examples, the video data memory 451 may be on-chip with other video decoder components 30 or off-chip with respect to those components. The video data memory 451 can be the same or part of the storage media 28 in Figure 1.
[0185] [0185] The video data memory 451 receives and stores encoded video data (for example, NAL units) from a bit stream. The entropy decoding unit 450 can receive encoded video data (for example, NAL units) from the video data memory 451 and can analyze the NAL units for syntax elements. The entropy decoding unit 450 can entropy decode entropy-encoded syntax elements in the NAL units. The prediction processing unit 452, the reverse quantization unit 454, the reverse transform processing unit 456, the reconstruction unit 458 and the filter unit 460 can generate decoded video data based on the syntax elements extracted from the stream bits. The entropy decoding unit 450 can perform a process generally reciprocal to that of the entropy coding unit 418. The prediction processing unit 452 can be configured to perform one or more techniques described in this document to use initial motion vector information. the candidate list of motion vector information that is included in signaling information.
[0186] [0186] In addition to obtaining syntax elements from the bit stream, the video decoder 30 can perform a reconstruction operation on a non-partitioned CU. To perform the reconstruction operation on a CU, the video decoder 30 can perform a reconstruction operation on each CU of the CU. By performing the reconstruction operation for each CU TU, the video decoder 30 can reconstruct residual CU blocks.
[0187] [0187] As part of performing a reconstruction operation on a CU's TU, the inverse quantization unit 454 can quantify inversely, that is, de-quantify, blocks of coefficients associated with the TU. After the inverse quantization unit 454 inversely quantifies a coefficient block, the inverse transform processing unit 456 can apply one or more inverse transforms to the coefficient block in order to generate a residual block associated with the TU. For example, the reverse transform processing unit 456 can apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT),
[0188] [0188] The reverse quantification unit 454 can perform particular techniques of this disclosure. For example, for at least one respective quantization group of a plurality of quantization groups within a CTB of a CTU of an image of the video data, the reverse quantization unit 454 may derive, based, at least in part, in the local quantification information signaled in the bit stream, a respective quantization parameter for the respective quantization group. In addition, in this example, the reverse quantization unit 454 can quantify in an inverse way, based on the respective quantization parameter for the respective quantization group, at least one transform coefficient of a transform block of a CT of a CU of a CTU CU . In this example, the respective quantification group is defined as a group of successive CUs, in the order of coding, or coding blocks, so that the limits of the respective quantization group need to be limits of the CUs or coding blocks and a size of the respective quantification group is greater than or equal to a threshold. The video decoder 30 (for example, reverse transform processing unit 456, reconstruction unit 458 and filter unit 460) can reconstruct, based on inverse quantized transform coefficients of the transform block, an encoding block of the ASS.
[0189] [0189] If a PU is coded using intra-prediction, the 466 intra-prediction processing unit can perform intra-prediction to generate PU predictive blocks. The 466 intraprediction processing unit can use an intraprediction mode to generate PU predictive blocks based on samples from spatially neighboring blocks. The intra-prediction processing unit 466 can determine the intra-prediction mode for the PU based on one or more elements of syntax obtained from the bit stream.
[0190] [0190] If a PU is encoded using inter-prediction, the entropy decoding unit 450 can determine motion information for the PU. The motion compensation unit 464 can determine, based on the movement information of the PU, one or more reference blocks. The motion compensation unit 464 can generate, based on one or more reference blocks, predictive blocks (for example, predictive luma blocks, Cb and Cr) for the PU.
[0191] [0191] The 458 reconstruction unit can use transform blocks (for example, luma transform blocks, Cb and Cr) for CU's TUs and the predictive blocks (for example, luma blocks, Cb and Cr) for PUs from CU, that is, intra-prediction data or inter-prediction data, as applicable, to reconstruct the coding blocks (for example, luma, Cb and Cr coding blocks) for the CU. For example, the 458 reconstruction unit can add samples from the transform blocks (for example, luma transform blocks, Cb and Cr) to the corresponding samples from the predictive blocks (for example, predictive luma blocks, Cb and Cr) to reconstruct the coding blocks (for example, luma, Cb and Cr coding blocks) of CU.
[0192] [0192] The filter unit 460 can perform an unlocking operation to reduce blocking artifacts associated with CU coding blocks. The video decoder 30 can store the CU encoding blocks in the decoded image buffer 462. The decoded image buffer 462 can provide reference images for subsequent motion compensation, intra-prediction and presentation on a display device, as the display device 32 of Figure 1. For example, the video decoder 30 can perform, on the basis of blocks in the decoded image temporary storage 462, intra-prediction or inter-prediction operations for PUs of other CUs.
[0193] [0193] Figure 15 is a block diagram that illustrates an exemplary method for decoding video according to one or more techniques described in this disclosure. Initially, the video decoder 30 receives a bit stream that includes one or more symbols representing a residual block and signaling information indicating initial motion vector information (502). The video decoder 30 builds a candidate list of motion vector information for a portion of a current frame (504). The video decoder 30 refines, based on one or more of the bilateral correspondence or the model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the reference frame that is within search range from the starting position (506). The video decoder generates a predictive block based on the refined motion vector information (508). The video decoder 30 decodes the current frame based on the predictive block (510).
[0194] [0194] Figure 16 is a block diagram that illustrates an exemplary method for encoding video according to one or more techniques described in this disclosure. Initially, video encoder 20 builds a candidate list of motion vector information for a portion of a current frame (552). Video encoder 20 selects initial motion vector information from the candidate list of motion vector information, the initial motion vector information indicating an initial position in a frame of reference (554). The video encoder 20 refines, based on one or more of the bilateral correspondence or the model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the reference frame that is within search range from the starting position (556). The video encoder 20 generates a predictive block based on the refined motion vector information (558). The video encoder 20 generates residual sample values for the current block of video data based on the predictive block (560). The video encoder 20 outputs a bit stream that includes one or more symbols representing an indication of residual sample values and signaling information indicating the initial motion vector information from the motion vector information candidate list (562 ).
[0195] [0195] Certain aspects of this disclosure have been described with respect to extensions of the HEVC standard for purposes of illustration. However, the techniques described in this disclosure may be useful for other video encoding processes, including other standard or proprietary video encoding processes not yet developed.
[0196] [0196] A video encoder, as described in this disclosure, can refer to a video encoder or a video decoder. Similarly, a video encoding unit can refer to a video encoder or a video decoder. Similarly, video encoding can refer to video encoding or video decoding, as applicable. In this disclosure, the phrase "based on" may indicate based only on, based at least in part on, or based in some way on. This disclosure may use the term "video unit", "video block" or "block" to refer to one or more sample blocks and syntax structures used to encode samples from the one or more sample blocks. Exemplary types of video units can include CTUs, CcUs, PUs, transform units (TUsS), macroblocks,
[0197] [0197] It should be recognized that, depending on - the example, certain actions or events of any of the techniques described in this document can be performed in a different sequence, can be added, merged or deleted completely (for example, not all described actions or events are necessary for the practice of the techniques). In addition, in certain examples, actions or events can be performed concurrently, for example, through multi-threaded processing, interrupt processing or multiple processors, instead of sequentially.
[0198] [0198] In one or more examples, the functions described can be implemented in hardware, software, firmware or any combination thereof. If implemented in software, functions can be stored in or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media that includes any medium that facilitates the transfer of a computer program from one place to another. another, for example, according to a communication protocol. In this way, computer-readable media can, in general, correspond to (1) tangible computer-readable storage media that is non-transitory or (2) a communication medium such as a carrier or signal wave. The data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.
[0199] [0199] By way of example, and not by way of limitation, such computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, memory flash, or any other means that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. In addition, any connection is properly called a computer-readable medium. For example, if instructions are transmitted from a website, server or other remote source using a coaxial cable, a fiber optic cable, a twisted pair, a digital subscriber line (DSL) or wireless technologies such as infrared, radio and microwave, then coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to tangible storage media not transient. The magnetic disk and the optical disk, as used in this document, include compact disk (CD), laser disk, optical disk, digital versatile disk (DVD), floppy disk and Blu-ray disk, in which the magnetic disks normally reproduce the data magnetically, while optical discs reproduce data optically with lasers. The combinations of the above must also be included in the scope of computer-readable media.
[0200] [0200] Instructions can be executed by a set of programmable and / or fixed-function processing circuits, which includes one or more processors, such as one or more DSPs, general purpose microprocessors, ASICs, FPGAS or another discrete logic circuit set or equivalent integrated. Consequently, the term “processor”, as used in this document, can refer to any of the aforementioned structures or any other structure suitable for the implementation of the techniques described in this document. In addition, in some respects, the functionality described in this document may be provided within software modules and / or dedicated hardware configured to encode and decode, or incorporated into a combined codec. In addition, the techniques could be fully implemented in one or more circuits or logic elements.
[0201] [0201] The techniques of this disclosure can be deployed on a wide variety of devices or devices, including a handset, an integrated circuit (IC) or a set of ICs (for example, a chip set). Various components, modules or units are described in the present disclosure to emphasize the functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Preferably, as described above, several units can be combined into one codec hardware unit or provided by a collection of interoperable hardware units, including one or more processors “as described above, in conjunction with appropriate software and / or firmware.
[0202] [0202] Several examples have been described. These and other examples are covered by the scope of the following claims.

权利要求:
Claims (34)
[1]
1. Method for decoding video data, the method comprising: building, using a video decoder implanted in the set of processing circuits, a candidate list of motion vector information for a portion of a current frame; receiving, through the video decoder, signaling information indicating initial motion vector information from the candidate list of motion vector information, the initial motion vector information indicating a starting position in a frame of reference; refine, by means of the video decoder, based on one or more of the bilateral correspondence or the model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the frame of reference that it is within a search range from the starting position; generate, through the video decoder, a predictive block based on the refined motion vector information; and decode, using the video decoder, the current frame based on the predictive block.
[2]
2. Method according to claim 1, in which the construction of the candidate list of motion vector information comprises:
determine, using the video decoder, a motion vector precision for the refined motion vector information.
[3]
3. Method, according to claim 1, in which the current frame portion is a current block of the current frame, the method additionally comprising: calculating mvdr = 4 «MVprecision When (W. H <64) ==> mvdti = 4 «(MVprecision - 2) AND MVde = 4« (MVprecision - 1) when (W. H <256), where MVprecision rY represents the motion vector precision, W is a width of the current block, and H is a current block height.
[4]
4. Method according to claim 1, in which the construction of the candidate list of motion vector information comprises: in response to the determination that the current frame portion corresponds to a B slice and uni motion vector information - forecasts must be included in the candidate list of motion vector information, add bi-predicted motion vector information to the candidate list of motion vector information that corresponds to the uni-predicted motion vector.
[5]
5. Method according to claim 4, in which the uni-predicted motion vector information indicates a first motion vector and in which the addition of the bi-predicted motion vector information to the candidate list of vector information motion comprises: generating the bi-predicted motion vector information to indicate the first motion vector and a second motion vector that corresponds to the first motion vector with an opposite sign.
[6]
6. Method according to claim 4, wherein the uni-predicted motion vector information indicates a first motion vector (MVO0) for a first frame of reference; where the bi-predicted motion vector information indicates the first motion vector and a second motion vector (MV1) for a second frame of reference; and where the addition of the vector information of: MV1 == 22 mMVO bi-predicted motion comprises calculating POCL, where POCO represents a time distance from the first frame of reference to the current frame and POCl represents a time distance from the second frame of reference to the current frame.
[7]
7. Method according to claim 1, in which the starting position is a first starting position, where the reference frame is a first reference frame, in which the initial motion vector information indicates a movement path that extends from the first initial position of the first reference frame through the current block of the current frame to a second initial position of a second reference frame, and where the refinement of the initial motion vector information comprises:
refine the movement path based on a difference in correspondence between the first start position and the second start position.
[8]
A method according to claim 7, wherein the refinement of the motion vector path comprises: modifying a first motion vector of the motion vector path which specifies the first starting position by a motion vector refinement; and modifying a second motion vector of the motion vector path that specifies the second starting position by refining the motion vector with an opposite sign.
[9]
9. Method according to claim 7, in which the refinement of the motion vector path comprises: staggering the motion path based on a time distance between the current frame and the first frame of reference and a time distance between the current frame and the second frame of reference.
[10]
10. The method of claim 7, wherein the refinement of the motion vector path comprises: refining a first motion vector of the motion vector path which specifies the first starting position based on the difference in correspondence between the first start position and second start position to generate a first refined motion vector; and refining a second motion vector from the motion vector path that specifies the second starting position based on the first refined motion vector.
[11]
11. Method according to claim 7, in which the refinement of the movement path comprises: determining the difference in correspondence between the first initial position and the second initial position based on a metric, wherein the metric comprises one or more among an Sum of Absolute Difference (SAD), a SAD with mean removed (MR-SAD), a Sum of Square of Differences (SSD), Normalized Cross Correlation (NCC) or a Structural Similarity Index (SSIM).
[12]
12. Method according to claim 11, wherein the refinement of the movement path comprises: selecting the metric from a plurality of metrics based on a current block size.
[13]
13. Method according to claim 11, wherein the refinement of the movement path comprises: selecting the metric as MR-SAD, NCC or SSIM when the current block size exceeds a block size threshold; and select the metric as SAD or SSE when the current block size does not exceed a block size threshold.
[14]
14. The method of claim 1, wherein the portion of the current frame corresponds to a current block of the current frame, a current encoding unit for the current frame, or a plurality of encoding units for the current frame.
[15]
15. Device for decoding video data, the device comprising:
a memory configured to store video data; and a set of processing circuits configured to: build a candidate list of motion vector information for a portion of a current frame; receiving signaling information indicating initial motion vector information from the candidate list of motion vector information, the initial motion vector information indicating a starting position in a frame of reference; refine, based on one or more among bilateral correspondence or model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the frame of reference that is within a search range from the starting position; generate a predictive block based on the refined motion vector information; and decode the current frame based on the predictive block.
[16]
16. Device according to claim 15 wherein, to build the candidate list of motion vector information, the processing circuitry is configured to: determine a motion vector accuracy for the motion vector information refined.
[17]
17. Device according to claim 15, in which the current frame portion is a current block of the current frame and in which the processing circuitry is configured to: calculate mvdr = 2 4 «MVprecision When (W. H <64) ==> mvdiw = 4 «(MVprecision - 2) E & mMVde = 4« (MVprecision - 1) when (W. H <256), where MVprecision rY represents the motion vector precision, W is a width of current block, and H a height of the current block.
[18]
18. Device according to claim 15, in which, to build the candidate list of motion vector information, the processing circuitry is configured to: in response to the determination that the current frame portion corresponds to a B slice and uni-predicted motion vector information must be included in the motion vector information candidate list, add bi-predicted motion vector information to the motion vector information candidate list that corresponds to the vector of uni-predicted movement.
[19]
19. Device according to claim 18, wherein the uni-predicted motion vector information indicates a first motion vector and in which, to add the bi-predicted motion vector information to the list of information candidates, motion vector, the set of processing circuits is configured to: generate bi-predicted motion vector information to indicate the first motion vector and a second motion vector that corresponds to the first motion vector with an opposite sign.
[20]
20. Device according to claim 18, wherein the unified motion vector information indicates a first motion vector (MVO0) for a first frame of reference; where the bi-predicted motion vector information indicates the first motion vector and a second motion vector (MV1) for a second frame of reference; and where, to add the bi-predicted motion vector information, the circuitry of: MVv1 = 22 MVo processing is configured to calculate POCI, where POCO represents a time distance from the first frame of reference to the current frame and POC represents a time distance from the second frame of reference to the current frame.
[21]
21. Device according to claim 15 wherein the starting position is a first starting position, where the reference frame is a first reference frame, where the initial motion vector information indicates a movement path that extends between the first initial position of the first reference frame through the current block of the current frame to a second initial position of a second reference frame, and in which, to refine the initial motion vector information, the set of processing circuits is configured to: refine the movement path based on a difference in correspondence between the first start position and the second start position.
[22]
22. The device of claim 21, wherein, to refine the motion vector path, the processing circuitry is configured to: modify a first motion vector of the motion vector path that specifies the first position initial by a motion vector refinement; and modifying a second motion vector of the motion vector path that specifies the second starting position by refining the motion vector with an opposite sign.
[23]
23. Device according to claim 21, in which, to refine the motion vector path, the processing circuitry is configured to: scale the motion path based on a time distance between the current frame and the first frame of reference and a time distance between the current frame and the second frame of reference.
[24]
24. The device of claim 21, wherein, to refine the motion vector path, the processing circuitry is configured to: refine a first motion vector of the motion vector path that specifies the first position initial based on the difference in correspondence between the first initial position and the second initial position to generate a first refined motion vector; and refining a second motion vector from the motion vector path that specifies the second starting position based on the first refined motion vector.
[25]
25. Device according to claim 21 wherein, to refine the movement path, the processing circuitry is configured to: determine the difference in correspondence between the first start position and the second start position based on a metric , where the metric comprises one or more of an Sum of Absolute Difference (SAD), a SAD with a mean removed (MR-SAD), a Sum of Differences Square (SSD), Normalized Cross Correlation (NCC) or an Index of Structural Similarity (SSIM).
[26]
26. A device according to claim 25 wherein, to refine the movement path, the processing circuitry is configured to: select the metric from a plurality of metrics based on a current block size.
[27]
27. Device according to claim 25 in which, to refine the movement path, the processing circuitry is configured to: select the metric as MR-SAD, NCC or SSIM when the current block size exceeds a threshold block size; and select the metric as SAD or SSE when the current block size does not exceed a block size threshold.
[28]
28. The device of claim 15, wherein the portion of the current frame corresponds to a current block of the current frame, a current encoding unit for the current frame, or a plurality of encoding units for the current frame.
[29]
29. The device of claim 15, wherein the device comprises a wireless communication device, which further comprises a receiver configured to receive encoded video data.
[30]
30. The device of claim 29, wherein the wireless communication device comprises a telephone apparatus and wherein the receiver is configured to demodulate, according to a wireless communication standard, a signal comprising the data encoded video.
[31]
31. Method for encoding video data, the method comprising: constructing, using a video encoder implanted in the set of processing circuits, a candidate list of motion vector information for a portion of a current frame; select, using the video encoder, initial motion vector information from the candidate list of motion vector information, the initial motion vector information indicating an initial position in a frame of reference;
refine, by means of the video encoder, based on one or more between bilateral correspondence or model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the reference frame that is within a search range from the starting position; generate, through the video encoder, a predictive block based on the refined motion vector information; generate, through the video encoder, residual sample values for the current block of video data based on the predictive block; and outputting, via the video encoder, an indication of the residual sample values and signaling information indicating the initial motion vector information from the candidate list of motion vector information.
[32]
32. Device for encoding video data, the device comprising: a memory configured to store video data; and a set of processing circuits configured to: build a candidate list of motion vector information for a portion of a current frame; select initial motion vector information from the candidate list of motion vector information, the initial motion vector information indicating a starting position in a frame of reference; refine, based on one or more among bilateral correspondence or model correspondence, the initial motion vector information to determine refined motion vector information that indicates a refined position in the frame of reference that is within a search range from the starting position; generate a predictive block based on the refined motion vector information; generate residual sample values for the current block of video data based on the predictive block; and issuing an indication of residual sample values and signaling information indicating the initial motion vector information from the motion vector information candidate list.
[33]
33. The device of claim 32 wherein the device comprises a wireless communication device, further comprising a transmitter configured to transmit encoded video data.
[34]
34. The device of claim 33 wherein the wireless communication device comprises a telephone apparatus and the transmitter is configured to modulate, in accordance with a wireless communication standard, a signal comprising the data of encoded video.

类似技术:

公开号 | 公开日 | 专利标题

BR112020006875A2|2020-10-06|low complexity project for fruc

JP6740243B2|2020-08-12|Motion vector derivation in video coding

BR112019025566A2|2020-06-23|MOTION VECTOR PREDICTION

BR112019013684A2|2020-01-28|motion vector reconstructions for bi-directional | optical flow

WO2018200960A1|2018-11-01|Gradient based matching for motion search and derivation

BR112020016133A2|2020-12-08|INTRA-BLOCK COPY FOR VIDEO ENCODING

BR112021005357A2|2021-06-15|improvements to history-based motion vector predictor

US20210218980A1|2021-07-15|Simplified history based motion vector prediction

BR112020006588A2|2020-10-06|affine prediction in video encoding

WO2018049043A1|2018-03-15|Sub-pu based bi-directional motion compensation in video coding

BR112019027821A2|2020-07-07|template pairing based on partial reconstruction for motion vector derivation

BR112020014522A2|2020-12-08|IMPROVED DERIVATION OF MOTION VECTOR ON THE DECODER SIDE

US10728573B2|2020-07-28|Motion compensated boundary pixel padding

BR112020007329A2|2020-10-06|several enhancements to fruc model matching

BR112021002967A2|2021-05-11|affine motion prediction

BR112020021263A2|2021-01-26|mvp derivation limitation based on decoder side motion vector derivation

BR112020006232A2|2020-10-13|Affine prediction motion information encoding for video encoding

WO2019140189A1|2019-07-18|Affine motion compensation with low bandwidth

EP3756352A1|2020-12-30|Simplified local illumination compensation

WO2019136131A1|2019-07-11|Generated affine motion vectors

WO2019052330A1|2019-03-21|Encoding and decoding method and apparatus for motion information

WO2020056798A1|2020-03-26|Method and device for video encoding and decoding

WO2020057648A1|2020-03-26|Inter-frame prediction method and device

同族专利:

公开号 | 公开日

AU2018349463B2|2021-05-20|

US10785494B2|2020-09-22|

US20200221110A1|2020-07-09|

KR102261696B1|2021-06-04|

TW201924344A|2019-06-16|

CN111201793A|2020-05-26|

KR20200058445A|2020-05-27|

TWI718412B|2021-02-11|

US20190110058A1|2019-04-11|

EP3695605A1|2020-08-19|

AU2018349463A1|2020-04-02|

SG11202001991TA|2020-04-29|

WO2019074622A1|2019-04-18|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US9083983B2|2011-10-04|2015-07-14|Qualcomm Incorporated|Motion vector predictor candidate clipping removal for video coding|

US9762904B2|2011-12-22|2017-09-12|Qualcomm Incorporated|Performing motion vector prediction for video coding|

US10334259B2|2012-12-07|2019-06-25|Qualcomm Incorporated|Advanced residual prediction in scalable and multi-view video coding|

US9538180B2|2012-12-17|2017-01-03|Qualcomm Incorporated|Motion vector prediction in video coding|

US9628795B2|2013-07-17|2017-04-18|Qualcomm Incorporated|Block identification using disparity vector in video coding|

US10244253B2|2013-09-13|2019-03-26|Qualcomm Incorporated|Video coding techniques using asymmetric motion partitioning|

US9854237B2|2014-10-14|2017-12-26|Qualcomm Incorporated|AMVP and merge candidate list derivation for intra BC and inter prediction unification|

US10958927B2|2015-03-27|2021-03-23|Qualcomm Incorporated|Motion information derivation mode determination in video coding|

CN108028939B|2015-09-02|2021-10-15|联发科技股份有限公司|Method and apparatus for decoder-side motion derivation for video coding|

US20180041769A1|2016-08-08|2018-02-08|Mediatek Inc.|Pattern-based motion vector derivation for video coding|

US10701390B2|2017-03-14|2020-06-30|Qualcomm Incorporated|Affine motion information derivation|

US10477237B2|2017-06-28|2019-11-12|Futurewei Technologies, Inc.|Decoder side motion vector refinement in video coding|

WO2019072368A1|2017-10-09|2019-04-18|Huawei Technologies Co., Ltd.|Limited memory access window for motion vector refinement|

US10785494B2|2017-10-11|2020-09-22|Qualcomm Incorporated|Low-complexity design for FRUC|US11252464B2|2017-06-14|2022-02-15|Mellanox Technologies, Ltd.|Regrouping of video data in host memory|

US10785494B2|2017-10-11|2020-09-22|Qualcomm Incorporated|Low-complexity design for FRUC|

US10834409B2|2018-04-06|2020-11-10|Arris Enterprises Llc|System and method of implementing multiple prediction models for local illumination compensation|

US10469869B1|2018-06-01|2019-11-05|Tencent America LLC|Method and apparatus for video coding|

WO2019234600A1|2018-06-05|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Interaction between pairwise average merging candidates and intra-block copy |

WO2019234671A1|2018-06-07|2019-12-12|Beijing Bytedance Network Technology Co., Ltd.|Improved pmmvd|

GB2589223A|2018-06-21|2021-05-26|Beijing Bytedance Network Tech Co Ltd|Component-dependent sub-block dividing|

US20200014918A1|2018-07-08|2020-01-09|Mellanox Technologies, Ltd.|Application accelerator|

US20200014945A1|2018-07-08|2020-01-09|Mellanox Technologies, Ltd.|Application acceleration|

WO2020065518A1|2018-09-24|2020-04-02|Beijing Bytedance Network Technology Co., Ltd.|Bi-prediction with weights in video coding and decoding|

KR20210094664A|2019-01-02|2021-07-29|텔레폰악티에볼라겟엘엠에릭슨|Side-motion refinement in video encoding/decoding systems|

US11272200B2|2019-06-24|2022-03-08|Tencent America LLC|Method and apparatus for video coding|

WO2021073488A1|2019-10-13|2021-04-22|Beijing Bytedance Network Technology Co., Ltd.|Interplay between reference picture resampling and video coding tools|

WO2021195240A1|2020-03-24|2021-09-30|Alibaba Group Holding Limited|Sign data hiding of video recording|

CN113840148A|2020-06-24|2021-12-24|Oppo广东移动通信有限公司|Inter-frame prediction method, encoder, decoder, and computer storage medium|

法律状态:
2021-11-23| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762571161P| true| 2017-10-11|2017-10-11|

US62/571,161|2017-10-11|

US16/131,860|US10785494B2|2017-10-11|2018-09-14|Low-complexity design for FRUC|

US16/131,860|2018-09-14|

PCT/US2018/051314|WO2019074622A1|2017-10-11|2018-09-17|Low-complexity design for fruc|

[返回顶部]